243

This is an easy question but say I have an MxN matrix. All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors. Here is the code:

extractedData = data[[:,1],[:,9]]. 

It seems like the above line should suffice but I guess not. I looked around but couldn't find anything syntax wise regarding this specific scenario.

11 Answers 11

399

I assume you wanted columns 1 and 9?

To select multiple columns at once, use

X = data[:, [1, 9]]

To select one at a time, use

x, y = data[:, 1], data[:, 9]

With names:

data[:, ['Column Name1','Column Name2']]

You can get the names from data.dtype.names

Sign up to request clarification or add additional context in comments.

7 Comments

How to do that with column names?
data[:, ['Column Name1','Column Name2']]
is it a view or a copy? my bottleneck is on this line I search way to optimize
could it be that this function is not working anymore?
What is this syntax called?
|
38

Assuming you want to get columns 1 and 9 with that code snippet, it should be:

extractedData = data[:,[1,9]]

Comments

18

if you want to extract only some columns:

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

if you want to exclude specific columns:

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]

Comments

14

Just:

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

The columns need not to be in order:

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])

Comments

11

One thing I would like to point out is, if the number of columns you want to extract is 1 the resulting matrix would not be a Mx1 Matrix as you might expect but instead an array containing the elements of the column you extracted.

To convert it to Matrix the reshape(M,1) method should be used on the resulting array.

2 Comments

Also you can achieve this by using a colon, for example data[:, 8:9]. This takes the eight column but does not remove the extra dimension.
data[:,8] will also pick the 8th column and return a Mx1 Matrix
3

One more thing you should pay attention to when selecting columns from N-D array using a list like this:

data[:,:,[1,9]]

If you are removing a dimension (by selecting only one row, for example), the resulting array will be (for some reason) permuted. So:

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!

Comments

3

You can use the following:

extracted_data = data.ix[:,['Column1','Column2']]

1 Comment

A good answer will always have an explanation of what was done and why it was done in such a manner, not only for the OP but for future visitors to SO. Please add some description to make others understand.
1

Here is yet another example that some may find useful when you need specific columns and ranges from your data, this takes a few seconds to run on millions of rows and you can just add more columns by adding additional lists (e.g., columns = ... + [1] + [5], etc.:

columns = [0] + [x for x in range(4,62-3)]
print(columns)
selected_data = train_data[:,columns]

Comments

0

I think the solution here is not working with an update of the python version anymore, one way to do it with a new python function for it is:

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

which gives you the desired outcome.

The documentation you can find here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

1 Comment

the question starts with a numpy array, not a dataframe
0

I could not edit the chosen answer so I'm adding an answer to clarify that using an integer to index seems to be returning a view (not a copy) while using a list returns a copy

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]

Comments

-1

you can also use extractedData=data([:,1],[:,9])

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.