Extracting specific columns in numpy array

Question

This is an easy question but say I have an MxN matrix. All I want to do is extract specific columns and store them in another numpy array but I get invalid syntax errors. Here is the code:

extractedData = data[[:,1],[:,9]].

It seems like the above line should suffice but I guess not. I looked around but couldn't find anything syntax wise regarding this specific scenario.

cs95 · Accepted Answer · 2020-08-23 10:39:35Z

399

I assume you wanted columns 1 and 9?

To select multiple columns at once, use

X = data[:, [1, 9]]

To select one at a time, use

x, y = data[:, 1], data[:, 9]

With names:

data[:, ['Column Name1','Column Name2']]

You can get the names from data.dtype.names…

edited Aug 23, 2020 at 10:39

cs95

406k106 gold badges744 silver badges797 bronze badges

answered Dec 5, 2011 at 14:24

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Zelphir Kaltstahl Over a year ago

How to do that with column names?

code-assassin Over a year ago

data[:, ['Column Name1','Column Name2']]

Fractale Over a year ago

is it a view or a copy? my bottleneck is on this line I search way to optimize

PV8 Over a year ago

could it be that this function is not working anymore?

Burrito Over a year ago

What is this syntax called?

|

Michael J. Barber · Accepted Answer · 2011-12-05 14:26:02Z

38

Assuming you want to get columns 1 and 9 with that code snippet, it should be:

extractedData = data[:,[1,9]]

answered Dec 5, 2011 at 14:26

Michael J. Barber

25.2k9 gold badges71 silver badges92 bronze badges

Comments

queise · Accepted Answer · 2015-06-01 11:14:00Z

18

if you want to extract only some columns:

idx_IN_columns = [1, 9]
extractedData = data[:,idx_IN_columns]

if you want to exclude specific columns:

idx_OUT_columns = [1, 9]
idx_IN_columns = [i for i in xrange(np.shape(data)[1]) if i not in idx_OUT_columns]
extractedData = data[:,idx_IN_columns]

answered Jun 1, 2015 at 11:14

queise

2,4361 gold badge27 silver badges28 bronze badges

Comments

yanhh · Accepted Answer · 2018-07-28 01:25:55Z

Just:

>>> m = np.matrix(np.random.random((5, 5)))
>>> m
matrix([[0.91074101, 0.65999332, 0.69774588, 0.007355  , 0.33025395],
        [0.11078742, 0.67463754, 0.43158254, 0.95367876, 0.85926405],
        [0.98665185, 0.86431513, 0.12153138, 0.73006437, 0.13404811],
        [0.24602225, 0.66139215, 0.08400288, 0.56769924, 0.47974697],
        [0.25345299, 0.76385882, 0.11002419, 0.2509888 , 0.06312359]])
>>> m[:,[1, 2]]
matrix([[0.65999332, 0.69774588],
        [0.67463754, 0.43158254],
        [0.86431513, 0.12153138],
        [0.66139215, 0.08400288],
        [0.76385882, 0.11002419]])

The columns need not to be in order:

>>> m[:,[2, 1, 3]]
matrix([[0.69774588, 0.65999332, 0.007355  ],
        [0.43158254, 0.67463754, 0.95367876],
        [0.12153138, 0.86431513, 0.73006437],
        [0.08400288, 0.66139215, 0.56769924],
        [0.11002419, 0.76385882, 0.2509888 ]])

Daksh · Accepted Answer · 2018-03-05 16:20:51Z

11

One thing I would like to point out is, if the number of columns you want to extract is 1 the resulting matrix would not be a Mx1 Matrix as you might expect but instead an array containing the elements of the column you extracted.

To convert it to Matrix the reshape(M,1) method should be used on the resulting array.

edited Mar 5, 2018 at 16:20

answered Oct 21, 2017 at 14:53

Daksh

1,15412 silver badges23 bronze badges

2 Comments

Jan Kukacka Over a year ago

Also you can achieve this by using a colon, for example data[:, 8:9]. This takes the eight column but does not remove the extra dimension.

StefanMK Over a year ago

data[:,8] will also pick the 8th column and return a Mx1 Matrix

Jan Kukacka · Accepted Answer · 2018-02-05 16:25:05Z

3

One more thing you should pay attention to when selecting columns from N-D array using a list like this:

data[:,:,[1,9]]

If you are removing a dimension (by selecting only one row, for example), the resulting array will be (for some reason) permuted. So:

print data.shape            # gives [10,20,30]
selection = data[1,:,[1,9]]
print selection.shape       # gives [2,20] instead of [20,2]!!

answered Feb 5, 2018 at 16:25

Jan Kukacka

1,3051 gold badge15 silver badges30 bronze badges

Comments

galoget · Accepted Answer · 2020-08-15 09:32:41Z

3

You can use the following:

extracted_data = data.ix[:,['Column1','Column2']]

edited Aug 15, 2020 at 9:32

galoget

7249 silver badges15 bronze badges

answered Sep 4, 2017 at 9:34

Rahul

311 bronze badge

1 Comment

Rucha Bhatt Joshi Over a year ago

A good answer will always have an explanation of what was done and why it was done in such a manner, not only for the OP but for future visitors to SO. Please add some description to make others understand.

kory · Accepted Answer · 2023-04-18 12:44:47Z

1

Here is yet another example that some may find useful when you need specific columns and ranges from your data, this takes a few seconds to run on millions of rows and you can just add more columns by adding additional lists (e.g., columns = ... + [1] + [5], etc.:

columns = [0] + [x for x in range(4,62-3)]
print(columns)
selected_data = train_data[:,columns]

answered Apr 18, 2023 at 12:44

kory

5124 silver badges9 bronze badges

Comments

PV8 · Accepted Answer · 2019-07-11 09:09:01Z

0

I think the solution here is not working with an update of the python version anymore, one way to do it with a new python function for it is:

extracted_data = data[['Column Name1','Column Name2']].to_numpy()

which gives you the desired outcome.

The documentation you can find here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html#pandas.DataFrame.to_numpy

answered Jul 11, 2019 at 9:09

PV8

6,3689 gold badges54 silver badges113 bronze badges

1 Comment

TMrtSmith Over a year ago

the question starts with a numpy array, not a dataframe

cookiemonster · Accepted Answer · 2022-06-28 09:46:11Z

0

I could not edit the chosen answer so I'm adding an answer to clarify that using an integer to index seems to be returning a view (not a copy) while using a list returns a copy

>>> x = np.zeros(shape=[2, 3])
>>> y = x[:, [0, 1]]
>>> z1, z2 = x[:, 0], x[:, 1]

>>> y[0, 0] = 1
>>> print(y)
[[1. 0.]
 [0. 0.]]
>>> print(x)
[[0. 0. 0.]
 [0. 0. 0.]]

>>> z1[0] = 2
>>> print(z1)
[2. 0.]
>>> print(x)
[[2. 0. 0.]
 [0. 0. 0.]]

answered Jun 28, 2022 at 9:46

cookiemonster

2,2842 gold badges22 silver badges27 bronze badges

Comments

Pranav Mahajan · Accepted Answer · 2017-08-02 17:18:14Z

-1

you can also use extractedData=data([:,1],[:,9])

answered Aug 2, 2017 at 17:18

Pranav Mahajan

311 bronze badge

Collectives™ on Stack Overflow

Extracting specific columns in numpy array

11 Answers 11

7 Comments

Comments

Comments

Comments

2 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

7 Comments

Comments

Comments

Comments

2 Comments

Comments

1 Comment

Comments

1 Comment

Comments

Comments

Linked

Related