Numpy 2D array in Python 3.4

Question

I have this code:

import pandas as pd
data = pd.read_csv("test.csv", sep=",")

data array looks like that:

enter image description here

The problem is that I can't split it by columns, like that:

week     = data[:,1]

It should split the second column into the week, but it doesn't do it:

*TypeError: unhashable type: 'slice' *

How should I do this to make it work?

I also wondering, that what this code do exactly? (Don't really understand np.newaxis part)

week     = data['1'][:, np.newaxis]

Result: enter image description here

re: np.newaxis: print the .shape attribute of data['1'] and data['1'][:, np.newaxis] — ev-br
– ev-br, Commented Mar 30, 2015 at 13:14

TheBlackCat · Accepted Answer · 2015-03-30 14:33:39Z

There are a few issues here.

First, read_csv uses a comma as a separator by default, so you don't need to specify that.

Second, the pandas csv reader by default uses the first row to get column headings. That doesn't appear to be what you want, so you need to use the header=None argument.

Third, it looks like your first column is the row number. You can use index_col=0 to use that column as the index.

Fourth, for pandas, the first index is the column, not the row. Further, using the standard data[ind] notation is indexing by column name, rather than column number. And you can't use a comma to index both row and column at the same time (you need to use data.loc[row, col] to do that).

So for your case, all you need to do to get the second columns is data[2], or if you use the first column as the row number then the second column becomes the first column, so you would do data[1]. This returns a pandas Series, which is the 1D equivalent of a 2D DataFrame.

So the whole thing should look like this:

import pandas as pd
data = pd.read_csv('test.csv', header=None, index_col=0)
week = data[1]

data looks like this:

    1   2     3   4
0                    
1    10   2   100  12
2    15   5   150  15
3    25   7   240  20
4    22  12   350  20
5    51  13   552  20
6   134  20   880  36
7   150  22   900  38
8   200  29  1020  44
9   212  31  1100  46
10  199  23  1089  45
11  220  32  1145  60

The '0' row doesn't exist, it is just there for informational purposes.

week looks like this:

0
1      10
2      15
3      25
4      22
5      51
6     134
7     150
8     200
9     212
10    199
11    220
Name: 1, dtype: int64

However, you can give columns (and rows) meaningful names in pandas, and then access them by those names. I don't know the column names, so I just made some up:

import pandas as pd
data = pd.read_csv('test.csv', header=None, index_col=0, names=['week', 'spam', 'eggs', 'grail'])
week = data['week']

In this case, data looks like this:

    week  spam  eggs  grail
1     10     2   100     12
2     15     5   150     15
3     25     7   240     20
4     33    12   350     20
5     51    13   552     20
6    134    20   880     36
7    150    22   900     38
8    200    29  1020     44
9    212    31  1100     46
10   199    23  1089     45
11   220    32  1145     50

And week looks like this:

1      10
2      15
3      25
4      33
5      51
6     134
7     150
8     200
9     212
10    199
11    220
Name: week, dtype: int64

For np.newaxis, what that does is add one dimension to the array. So say you have a 1D array (a vector), using np.newaxis on it would turn it into a 2D array. It would turn a 2D array into a 3D array, 3D into 4D, and so on. Depending on where you put it (such as [:,np.newaxis] vs. [np.newaxis,:], you can determine which dimension to add. So np.arange(10)[np.newaxis,:] (or just np.arange(10)[np.newaxis]) gives you a shape (1,10) 2D array, while np.arange(10)[:,np.newaxis] gives you a shape (10,1) 2D array.

In your case, what the line is doing is getting the column with the name 1, which is a 1D pandas Series, then adding a new dimension to it. However, instead of turning it back into a DataFrame, it instead converts it into a 1D numpy array, then adds one dimension to make it a 2D numpy array.

This, however, is dangerous long-term. There is no guarantee that this sort of silent conversion won't be changed at some point. To change a pandas objects to a numpy one, you should use an explicit conversion with the values method, so in your cases data.values or data['1'].values.

However, you don't really need a numpy array. A series is fine. If you really want a 2D object, you can convert a Series into a DataFrame by using something like data['1'].to_frame().

Collectives™ on Stack Overflow

Numpy 2D array in Python 3.4

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related