I'm looking for a clean way to transform a vector of integers into a 2D array of binary values, where ones are in the columns corresponding to the values of the vector taken as indices
i.e.
v = np.array([1, 5, 3])
C = np.zeros((v.shape[0], v.max()))
what i'm looking for is the way to transform C into this:
array([[ 1., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 1., 0., 0.]])
i've come up with this:
C[np.arange(v.shape[0]), v.T-1] = 1
but i wonder if there is less verbose / more elegant approach?
thanks!
UPDATE
Thanks for your comments! There was an error in my code: if there is 0 in v, it will put 1 into wrong place (last column). Instead, i have to expand categorical data to include 0.
jrennie's answer is a big win for large vectors as long as you deal with sparse matrices exclusively. In my case i need to return an array for compatibility, and the conversion levels the advantage entirely - see both solutions:
def permute_array(vector):
permut = np.zeros((vector.shape[0], vector.max()+1))
permut[np.arange(vector.shape[0]), vector] = 1
return permut
def permute_matrix(vector):
indptr = range(vector.shape[0]+1)
ones = np.ones(vector.shape[0])
permut = sparse.csr_matrix((ones, vector, indptr))
return permut
In [193]: vec = np.random.randint(1000, size=1000)
In [194]: np.all(permute_matrix(vec) == permute_array(vec))
Out[194]: True
In [195]: %timeit permute_array(vec)
100 loops, best of 3: 3.49 ms per loop
In [196]: %timeit permute_matrix(vec)
1000 loops, best of 3: 422 µs per loop
Now, adding conversion:
def permute_matrix(vector):
indptr = range(vector.shape[0]+1)
ones = np.ones(vector.shape[0])
permut = sparse.csr_matrix((ones, vector, indptr))
return permut.toarray()
In [198]: %timeit permute_matrix(vec)
100 loops, best of 3: 4.1 ms per loop
.Tthough