5

I'm looking for a clean way to transform a vector of integers into a 2D array of binary values, where ones are in the columns corresponding to the values of the vector taken as indices

i.e.

v = np.array([1, 5, 3])
C = np.zeros((v.shape[0], v.max()))

what i'm looking for is the way to transform C into this:

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.]])

i've come up with this:

C[np.arange(v.shape[0]), v.T-1] = 1

but i wonder if there is less verbose / more elegant approach?

thanks!

UPDATE

Thanks for your comments! There was an error in my code: if there is 0 in v, it will put 1 into wrong place (last column). Instead, i have to expand categorical data to include 0.

jrennie's answer is a big win for large vectors as long as you deal with sparse matrices exclusively. In my case i need to return an array for compatibility, and the conversion levels the advantage entirely - see both solutions:

    def permute_array(vector):
        permut = np.zeros((vector.shape[0], vector.max()+1))
        permut[np.arange(vector.shape[0]), vector] = 1
        return permut

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut

    In [193]: vec = np.random.randint(1000, size=1000)
    In [194]: np.all(permute_matrix(vec) == permute_array(vec))
    Out[194]: True

    In [195]: %timeit permute_array(vec)
    100 loops, best of 3: 3.49 ms per loop

    In [196]: %timeit permute_matrix(vec)
    1000 loops, best of 3: 422 µs per loop

Now, adding conversion:

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut.toarray()

    In [198]: %timeit permute_matrix(vec)
    100 loops, best of 3: 4.1 ms per loop
4
  • 3
    Your way looks good to me! You can do without the .T though Commented Apr 25, 2014 at 19:03
  • You are trying to implement a permutation matrix. I think your solution is fine. As Mr E said, without T. See also this question in[stackoverflow.com/]Stack Overflow. Was wondering if there is some function in ´scipy.linalg´ that implements the permutation matrix. Commented Apr 25, 2014 at 19:16
  • @Tengis Your link doesn't work. Commented Apr 25, 2014 at 20:48
  • @askewchan sorry about that. Here is the Link Commented Apr 25, 2014 at 20:52

1 Answer 1

6

A drawback to your solution is that it is inefficient for large values. If you want a more efficient representation, create scipy sparse matrix, e.g.:

import scipy.sparse
import numpy

indices = [1, 5, 3]
indptr = range(len(indices)+1)
data = numpy.ones(len(indices))
matrix = scipy.sparse.csr_matrix((data, indices, indptr))

Read about the Yale Format and scipy's csr_matrix to better understand the objects (indices, indptr, data) and usage.

Note that I am not subtracting 1 from the indices in the above code. Use indices = numpy.array([1, 5, 3])-1 if that's what you want.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.