numpy transform vector to binary matrix

Question

I'm looking for a clean way to transform a vector of integers into a 2D array of binary values, where ones are in the columns corresponding to the values of the vector taken as indices

i.e.

v = np.array([1, 5, 3])
C = np.zeros((v.shape[0], v.max()))

what i'm looking for is the way to transform C into this:

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  1.,  0.,  0.]])

i've come up with this:

C[np.arange(v.shape[0]), v.T-1] = 1

but i wonder if there is less verbose / more elegant approach?

thanks!

UPDATE

Thanks for your comments! There was an error in my code: if there is 0 in v, it will put 1 into wrong place (last column). Instead, i have to expand categorical data to include 0.

jrennie's answer is a big win for large vectors as long as you deal with sparse matrices exclusively. In my case i need to return an array for compatibility, and the conversion levels the advantage entirely - see both solutions:

    def permute_array(vector):
        permut = np.zeros((vector.shape[0], vector.max()+1))
        permut[np.arange(vector.shape[0]), vector] = 1
        return permut

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut

    In [193]: vec = np.random.randint(1000, size=1000)
    In [194]: np.all(permute_matrix(vec) == permute_array(vec))
    Out[194]: True

    In [195]: %timeit permute_array(vec)
    100 loops, best of 3: 3.49 ms per loop

    In [196]: %timeit permute_matrix(vec)
    1000 loops, best of 3: 422 µs per loop

Now, adding conversion:

    def permute_matrix(vector):
        indptr = range(vector.shape[0]+1)
        ones = np.ones(vector.shape[0])
        permut = sparse.csr_matrix((ones, vector, indptr))
        return permut.toarray()

    In [198]: %timeit permute_matrix(vec)
    100 loops, best of 3: 4.1 ms per loop

Your way looks good to me! You can do without the .T though — YXD
– YXD, Commented Apr 25, 2014 at 19:03
You are trying to implement a permutation matrix. I think your solution is fine. As Mr E said, without T. See also this question in[stackoverflow.com/]Stack Overflow. Was wondering if there is some function in ´scipy.linalg´ that implements the permutation matrix. — Tengis
– Tengis, Commented Apr 25, 2014 at 19:16

jrennie · Accepted Answer · 2014-04-26 00:10:00Z

6

A drawback to your solution is that it is inefficient for large values. If you want a more efficient representation, create scipy sparse matrix, e.g.:

import scipy.sparse
import numpy

indices = [1, 5, 3]
indptr = range(len(indices)+1)
data = numpy.ones(len(indices))
matrix = scipy.sparse.csr_matrix((data, indices, indptr))

Read about the Yale Format and scipy's csr_matrix to better understand the objects (indices, indptr, data) and usage.

Note that I am not subtracting 1 from the indices in the above code. Use indices = numpy.array([1, 5, 3])-1 if that's what you want.

edited Apr 26, 2014 at 0:10

answered Apr 25, 2014 at 21:20

jrennie

1,95712 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

numpy transform vector to binary matrix

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related