Using numpy to extract data from CSV file

Question

I'm working with numpy and trying to find which platform sold the most copies in NA region.

I have a CSV file holding a lot of data looking like this:

Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33
5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1,31.37
6,Tetris,GB,1989,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
7,New Super Mario Bros.,DS,2006,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
8,Wii Play,Wii,2006,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
9,New Super Mario Bros. Wii,Wii,2009,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
10,Duck Hunt,NES,1984,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31
11,Nintendogs,DS,2005,Simulation,Nintendo,9.07,11,1.93,2.75,24.76

I would like to print the platform with the most sales and the amount sold in the NA region. How can I do this?

I hard coded all the different platform as masks like: maskNES = (data[:,2] == 'NES') and then i assigned it to a variable like: pfNES = data[maskNES][:,6].sum() lastly i compared all the platforms to find the one with the highest value. Just seems like a idiotic way of doing it. If i were to have thousands of different platforms Oh and i took the csv data into a matrix called 'data' — Rainoa
– Rainoa, Commented Mar 4, 2017 at 23:44

Stephen Rauch · Accepted Answer · 2017-03-05 00:02:29Z

1

With pandas this is fairly straight forward.

Code:

# read csv data into a dataframe
df = pd.read_csv(data, skipinitialspace=True)

# roll up by NA Sales
platform_roll_up = df.groupby('Platform')['NA_Sales'].sum()

# find row with max sales
idx_max = platform_roll_up.idxmax()

# show platform and sales for max
print(idx_max, platform_roll_up[idx_max])

Results:

Wii 101.71

Test Data:

data = StringIO(u"""
    Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
    1,Wii Sports,Wii,2006,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
    2,Super Mario Bros.,NES,1985,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
    3,Mario Kart Wii,Wii,2008,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
    4,Wii Sports Resort,Wii,2009,Sports,Nintendo,15.75,11.01,3.28,2.96,33
    5,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,Nintendo,11.27,8.89,10.22,1,31.37
    6,Tetris,GB,1989,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26
    7,New Super Mario Bros.,DS,2006,Platform,Nintendo,11.38,9.23,6.5,2.9,30.01
    8,Wii Play,Wii,2006,Misc,Nintendo,14.03,9.2,2.93,2.85,29.02
    9,New Super Mario Bros. Wii,Wii,2009,Platform,Nintendo,14.59,7.06,4.7,2.26,28.62
    10,Duck Hunt,NES,1984,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31
    11,Nintendogs,DS,2005,Simulation,Nintendo,9.07,11,1.93,2.75,24.76
""")

edited Mar 5, 2017 at 0:02

answered Mar 4, 2017 at 23:49

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rainoa Over a year ago

Thanks for the quick answer! I'm trying to work with a solution that works for numpy.ndarray. Which doesn't have a iloc attribute. Should i stay away from ndarray in this case? Also I'm trying to find total NA_Sales value of all product of X platform. Instead of finding the highest single value. By the way, I am new to python

Rainoa Over a year ago

Thanks a ton! Greatly appreciate the answer, your edited version was exactly what i was looking for.

hpaulj · Accepted Answer · 2017-03-05 00:40:21Z

1

Loading this with genfromtxt is straight forward:

In [280]: data=np.genfromtxt('stack42602390.csv',delimiter=',',names=True, dtype=None)

In [281]: data
Out[281]: 
array([ ( 1, b'Wii Sports', b'Wii', 2006, b'Sports', b'Nintendo',  41.49,  29.02,   3.77,  8.46,  82.74),
       ( 2, b'Super Mario Bros.', b'NES', 1985, b'Platform', b'Nintendo',  29.08,   3.58,   6.81,  0.77,  40.24),
       ( 3, b'Mario Kart Wii', b'Wii', 2008, b'Racing', b'Nintendo',  15.85,  12.88,   3.79,  3.31,  35.82),
....
       (11, b'Nintendogs', b'DS', 2005, b'Simulation', b'Nintendo',   9.07,  11.  ,   1.93,  2.75,  24.76)], 
      dtype=[('Rank', '<i4'), ('Name', 'S25'), ('Platform', 'S3'), ('Year', '<i4'), ('Genre', 'S12'), ('Publisher', 'S8'), ('NA_Sales', '<f8'), ('EU_Sales', '<f8'), ('JP_Sales', '<f8'), ('Other_Sales', '<f8'), ('Global_Sales', '<f8')])

The b'string' is just the Python3 way of showing bytestrings, the default string format from genfromtxt. They won't show in Py2.

The result is a structured array, with different field names and types. It is not a 2d array with rows and columns.

The NA_Sales data:

In [282]: data['NA_Sales']
Out[282]: 
array([ 41.49,  29.08,  15.85,  15.75,  11.27,  23.2 ,  11.38,  14.03,
        14.59,  26.93,   9.07])

And the maximum of these:

In [283]: np.argmax(data['NA_Sales'])
Out[283]: 0

and the corresponding record:

In [284]: data[0]
Out[284]: (1, b'Wii Sports', b'Wii', 2006, b'Sports', b'Nintendo',  41.49,  29.02,  3.77,  8.46,  82.74)

To make the most use of this array you'll have to read up on structured arrays.

answered Mar 5, 2017 at 0:40

hpaulj

233k14 gold badges260 silver badges392 bronze badges

3 Comments

Rainoa Over a year ago

Tried this solution but ran into the problem that longer down my csv file there is commas inside the titles and i couldn't add quotechar=' " ' to np.getfromtext

hpaulj Over a year ago

The csv package handles quotes, but the numpy readers don't. genfromtxt accepts input from anything that feeds it lines, so you can preprocess the lines, cleaning them up so they can be parsed with simple delimiters. That's been discussed in many previous SO questions.

hpaulj Over a year ago

A recent example of genfromtxt with a filter input: stackoverflow.com/a/42593389/901925

Collectives™ on Stack Overflow

Using numpy to extract data from CSV file

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related