How do I iterate through columns using a for loop and the csv library in Python?

Question

I am an extremely novice Python user trying to sum columns of data in a .csv file. I found other answers that really helped me to get started (here and here, for example).

However, my problem is that I want to loop over my file to get sums for all columns.

My formatted data look like this:

    z   y   x   w   v   u
a   0   8   7   6   0   5
b   0   0   5   4   0   3
c   0   2   3   4   0   3
d   0   6   7   8   0   9

Or like this in .csv format:

,z,y,x,w,v,u
a,0,8,7,6,0,5
b,0,0,5,4,0,3
c,0,2,3,4,0,3
d,0,6,7,8,0,9

For right now, I am just trying to get the iteration to work. I will worry about the summing later. Here's my code:

import csv
data = file("test.csv", "r")
headerrow = data.next()
headerrow = headerrow.strip().split(",")
end = len(headerrow)
for i in range (1, end):
    for row in csv.reader(data):
        print row[i]

Here is what I get:

>>> 
0
0
0
0
>>>

So, it prints the values at index 1 for each row, but does not continue through the other indices.

What obvious thing am I missing here?

Update:

Following the very helpful suggestions and explanations, I now have this:

import csv
with open("test.csv") as data:
    headerrow = next(data)
    delim = "," if "," == headerrow[0] else " "
    headerrow = filter(None, headerrow.rstrip().split(delim))
    reader = csv.reader(data, delimiter=delim, skipinitialspace=True)
    zipped = zip(*reader)
    print zipped
    strings = next(zipped)
    print ([sum(map(int,col)) for col in zipped])

This returns an error:

Traceback (most recent call last):
  File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9,     in <module>
    strings = next(zipped)
TypeError: list object is not an iterator

I do not understand this...? Sorry!

Use only the csv module, don't pre-parse anything. You're mutating the file object's state. — Reut Sharabani
– Reut Sharabani, Commented Mar 27, 2015 at 23:17

Padraic Cunningham · Accepted Answer · 2015-03-28 00:03:13Z

6

import csv
with  open('in.csv')as f:
    head = next(f)
    # decide delimiter by what is in header 
    delim = "," if "," ==  head[0] else " "
    # need to filter empty strings 
    head = filter(None, head.rstrip().split(delim))
    # skipinitialspace must be set as you have two spaces delimited
    reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
    # transpose rows
    zipped = zip(*reader)
    # skip first column
    strings = next(zipped)
    # sum each column
    print([sum(map(int,col)) for col in zipped])

[0, 16, 22, 22, 0, 20]

To create a dict matching headers to colunm sums you can so something like:

print(dict(zip(list(head), (sum(map(int,col)) for col in zipped))))

which outputs:

{'u': 20, 'w': 22, 'x': 22, 'z': 0, 'y': 16, 'v': 0}

I used python3 for all of the above, if you are using python2 replace with:

zip -> itertools.izip
filter -> itertools.izip
map -> itertools.imap

Python 2 code:

import csv
from itertools import izip, imap, ifilter
with  open('in.csv')as f:
    head = next(f)
    # decide delimiter by what is in header
    delim = "," if "," ==  head[0] else " "
    # need to filter empty strings
    head = ifilter(None, head.rstrip().split(delim))
    # skipinitialspace must be set as you have two spaces delimited
    reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
    # transpose rows
    zipped = izip(*reader)
    # skip first column
    strings = next(zipped)
    # sum each column
    print([sum(imap(int,col)) for col in zipped])

Output:

[0, 16, 22, 22, 0, 20]

If you are doing a lot of this kind of work then pandas especially pandas.read_csv may be useful, below is a very basic example, some pandas guru may hopefully add to it:

import  pandas as pd

df = pd.read_csv("in.csv")
print(df.sum())
Unnamed: 0    abcd
z                0
y               16
x               22
w               22
v                0
u               20
dtype: object

edited Mar 28, 2015 at 0:03

answered Mar 27, 2015 at 23:16

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

17th Lvl Botanist Over a year ago

This is very helpful, especially in tandem with the comments above from jkdc. However, I get the following error

Traceback (most recent call last):   File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9, in <module>     strings = next(zipped) TypeError: list object is not an iterator

Padraic Cunningham Over a year ago

@LaurasianDisjunction. I was just editing. I am using python3, to replicate the code above using python2 use the itertools as in the edit to my answer

Padraic Cunningham Over a year ago

@LaurasianDisjunction, added the equivalent python2 code

17th Lvl Botanist Over a year ago

Oh, hallelujah, happy Friday, praises be to the angels on high, and will you marry me?! I can now go generate some data and get on with my life. This is exactly what I wanted!

Padraic Cunningham Over a year ago

@LaurasianDisjunction.lol no worries, thanks for the proposal but I have made a vow to stop marrying people from the internet ;)

JuniorCompressor · Accepted Answer · 2015-03-27 23:42:34Z

3

You can use numpy:

import csv
import numpy as np
with open("test.csv") as f:
    r = csv.reader(f, delimiter=",")
    # For space format: r = csv.reader(f, delimiter=" ", skipinitialspace=True)
    # Thanks to Padraic Cunningham ^^
    next(r) # Skip header row
    sums = sum((np.array(map(int, row[1:])) for row in r))

Result:

>>> sums
array([ 0, 16, 22, 22,  0, 20])

edited Mar 27, 2015 at 23:42

answered Mar 27, 2015 at 23:29

JuniorCompressor

20.1k4 gold badges33 silver badges58 bronze badges

10 Comments

Padraic Cunningham Over a year ago

This will fail for the first example

JuniorCompressor Over a year ago

@PadraicCunningham Why should be so pedantic? I don't think delimiter detection is the issue :)

MrAlexBailey Over a year ago

How's he supposed to utilize numpy when he clearly states he's just trying to get an iteration to work.

Padraic Cunningham Over a year ago

I thought that was actually a big part of the question, having code that will work using whichever is passed, the first example won't work using csv out of the box. skipinitialspace is required

JuniorCompressor Over a year ago

@Jkdc the answers can be used from others too.

|

MrAlexBailey · Accepted Answer · 2015-03-27 23:29:55Z

This may clarify some of what exactly is going on... it looks like you're slightly over complicating things. This is very simple Python and not intended to be a direct or final solution to your problem, but more so help understand what is happening.

import csv 

sumthree = 0

with open('test.csv', 'rb') as f:    # Open the file (always use binary 'rb' mode for CSV files)
    header = next(f)        # Extract the header line from the file
    csvr = csv.reader(f)    # Create a CSV object with the rest of the file
    for row in csvr:
        print row           # Now loop over the file and print each row

        sumthree += int(row[2])

    print sumthree

At this point each row is going to print as a list, e.g. ['a','0','8','7','6','0','5']

So every iteration through that loop, we're moving down row by row. row[0] is going to be the first column, row[1] will be the 2nd, etc. If you want to sum the 3rd column of the file you could use sumthree += int(row[2]). At the end of this we print sumthree and see the total of all the numbers in the 3rd column.

Collectives™ on Stack Overflow

How do I iterate through columns using a for loop and the csv library in Python?

3 Answers 3

5 Comments

10 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related