3

I am an extremely novice Python user trying to sum columns of data in a .csv file. I found other answers that really helped me to get started (here and here, for example).

However, my problem is that I want to loop over my file to get sums for all columns.

My formatted data look like this:

    z   y   x   w   v   u
a   0   8   7   6   0   5
b   0   0   5   4   0   3
c   0   2   3   4   0   3
d   0   6   7   8   0   9

Or like this in .csv format:

,z,y,x,w,v,u
a,0,8,7,6,0,5
b,0,0,5,4,0,3
c,0,2,3,4,0,3
d,0,6,7,8,0,9

For right now, I am just trying to get the iteration to work. I will worry about the summing later. Here's my code:

import csv
data = file("test.csv", "r")
headerrow = data.next()
headerrow = headerrow.strip().split(",")
end = len(headerrow)
for i in range (1, end):
    for row in csv.reader(data):
        print row[i]

Here is what I get:

>>> 
0
0
0
0
>>> 

So, it prints the values at index 1 for each row, but does not continue through the other indices.

What obvious thing am I missing here?

Update:

Following the very helpful suggestions and explanations, I now have this:

import csv
with open("test.csv") as data:
    headerrow = next(data)
    delim = "," if "," == headerrow[0] else " "
    headerrow = filter(None, headerrow.rstrip().split(delim))
    reader = csv.reader(data, delimiter=delim, skipinitialspace=True)
    zipped = zip(*reader)
    print zipped
    strings = next(zipped)
    print ([sum(map(int,col)) for col in zipped])

This returns an error:

Traceback (most recent call last):
  File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9,     in <module>
    strings = next(zipped)
TypeError: list object is not an iterator

I do not understand this...? Sorry!

1
  • Use only the csv module, don't pre-parse anything. You're mutating the file object's state. Commented Mar 27, 2015 at 23:17

3 Answers 3

6
import csv
with  open('in.csv')as f:
    head = next(f)
    # decide delimiter by what is in header 
    delim = "," if "," ==  head[0] else " "
    # need to filter empty strings 
    head = filter(None, head.rstrip().split(delim))
    # skipinitialspace must be set as you have two spaces delimited
    reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
    # transpose rows
    zipped = zip(*reader)
    # skip first column
    strings = next(zipped)
    # sum each column
    print([sum(map(int,col)) for col in zipped])

[0, 16, 22, 22, 0, 20]

To create a dict matching headers to colunm sums you can so something like:

print(dict(zip(list(head), (sum(map(int,col)) for col in zipped))))

which outputs:

{'u': 20, 'w': 22, 'x': 22, 'z': 0, 'y': 16, 'v': 0}

I used python3 for all of the above, if you are using python2 replace with:

zip -> itertools.izip
filter -> itertools.izip
map -> itertools.imap

Python 2 code:

import csv
from itertools import izip, imap, ifilter
with  open('in.csv')as f:
    head = next(f)
    # decide delimiter by what is in header
    delim = "," if "," ==  head[0] else " "
    # need to filter empty strings
    head = ifilter(None, head.rstrip().split(delim))
    # skipinitialspace must be set as you have two spaces delimited
    reader = csv.reader(f,delimiter=delim, skipinitialspace=True)
    # transpose rows
    zipped = izip(*reader)
    # skip first column
    strings = next(zipped)
    # sum each column
    print([sum(imap(int,col)) for col in zipped])

Output:

[0, 16, 22, 22, 0, 20]

If you are doing a lot of this kind of work then pandas especially pandas.read_csv may be useful, below is a very basic example, some pandas guru may hopefully add to it:

import  pandas as pd

df = pd.read_csv("in.csv")
print(df.sum())
Unnamed: 0    abcd
z                0
y               16
x               22
w               22
v                0
u               20
dtype: object
Sign up to request clarification or add additional context in comments.

5 Comments

This is very helpful, especially in tandem with the comments above from jkdc. However, I get the following error Traceback (most recent call last): File "C:\Users\the hexarch\Desktop\remove_total_absences_test.py", line 9, in <module> strings = next(zipped) TypeError: list object is not an iterator
@LaurasianDisjunction. I was just editing. I am using python3, to replicate the code above using python2 use the itertools as in the edit to my answer
@LaurasianDisjunction, added the equivalent python2 code
Oh, hallelujah, happy Friday, praises be to the angels on high, and will you marry me?! I can now go generate some data and get on with my life. This is exactly what I wanted!
@LaurasianDisjunction.lol no worries, thanks for the proposal but I have made a vow to stop marrying people from the internet ;)
3

You can use numpy:

import csv
import numpy as np
with open("test.csv") as f:
    r = csv.reader(f, delimiter=",")
    # For space format: r = csv.reader(f, delimiter=" ", skipinitialspace=True)
    # Thanks to Padraic Cunningham ^^
    next(r) # Skip header row
    sums = sum((np.array(map(int, row[1:])) for row in r))

Result:

>>> sums
array([ 0, 16, 22, 22,  0, 20])

10 Comments

This will fail for the first example
@PadraicCunningham Why should be so pedantic? I don't think delimiter detection is the issue :)
How's he supposed to utilize numpy when he clearly states he's just trying to get an iteration to work.
I thought that was actually a big part of the question, having code that will work using whichever is passed, the first example won't work using csv out of the box. skipinitialspace is required
@Jkdc the answers can be used from others too.
|
2

This may clarify some of what exactly is going on... it looks like you're slightly over complicating things. This is very simple Python and not intended to be a direct or final solution to your problem, but more so help understand what is happening.

import csv 

sumthree = 0

with open('test.csv', 'rb') as f:    # Open the file (always use binary 'rb' mode for CSV files)
    header = next(f)        # Extract the header line from the file
    csvr = csv.reader(f)    # Create a CSV object with the rest of the file
    for row in csvr:
        print row           # Now loop over the file and print each row

        sumthree += int(row[2])

    print sumthree

At this point each row is going to print as a list, e.g. ['a','0','8','7','6','0','5']

So every iteration through that loop, we're moving down row by row. row[0] is going to be the first column, row[1] will be the 2nd, etc. If you want to sum the 3rd column of the file you could use sumthree += int(row[2]). At the end of this we print sumthree and see the total of all the numbers in the 3rd column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.