2

Suppose I have two csv files file1.csv

event_id, polarity
   1124,   0.3763
  36794,   0.638
 dhejjd,   0.3627

file2.csv

event_id, tallies
   61824,   0.3
   36794,   0.8
   dhejjd,   0.9
   dthdnb,   0.66

I want to access the polarity and tallies for each event_id. how can i read these files in 2 arrays so that for each [event_id] i can get the polarity and tallies and then perform my calculations with these two values. i was trying this but didnt work out I got an error:

 for event_id, polarity in file1reader: ValueError: need more than 1 value to unpack

My code: import csv

file1reader = csv.reader(open("file1.csv"), delimiter=",")
file2reader = csv.reader(open("file2.csv"), delimiter=",")

header1 = file1reader.next() #header
header2 = file2reader.next() #header

for event_id, polarity in file1reader:
    #if event_id and polarity in file1reader:
      for event_id, tallies in file2reader:
        #if event_id in file2reader:
          if file1reader.event_id == file2reader.event_id:
            print event_id, polarity, tallies   
            break   
file1reader.close()
file2reader.close() 
6
  • What did not work out? Be more specific. Are you getting any error? Commented May 29, 2015 at 9:09
  • Does you csv actually look like that? Commented May 29, 2015 at 9:18
  • yes. for event_id, polarity in file1reader: ValueError: need more than 1 value to unpack Commented May 29, 2015 at 9:22
  • 1
    @MEH, does it have spaces? Commented May 29, 2015 at 9:33
  • 1
    @MEH. can you post the actual format of your csv files, what you have in your question is a mess. Commented May 29, 2015 at 9:46

5 Answers 5

3

Use pandas data frames instead of numpy arrays

import pandas as pd
df = pd.read_csv("file1.csv", index_col="event_id", skipinitialspace=True)
df2 = pd.read_csv("file2.csv", index_col="event_id", skipinitialspace=True)
df = df.merge(df2, how='outer', left_index=True, right_index=True)

P.S. Corrected the code so that it runs. The 'outer' join means that if only 'polarity' or 'tallies' exist for a given 'event_id', then missing values are coded as NaNs. The output is

          polarity  tallies
event_id                   
1124        0.3763      NaN
36794       0.6380     0.80
61824          NaN     0.30
dhejjd      0.3627     0.90
dthdnb         NaN     0.66

If you need only rows where both are present, use how='inner'

P.P.S To work with this data frame further you can, for example, replace NaNs with some value, let us say 0:

df.fillna(0, inplace=True)

You can select elements by label

df.loc["dhejjd","polarity"]
df.loc[:,"tallies"]

or by integer position

df.iloc[0:3,:]

If you never used pandas, it takes some time to learn it and get used to it. And it is worth every second.

Sign up to request clarification or add additional context in comments.

4 Comments

No one will and is doing that if our answer is really bad then how can we find what mistake we are doing
Not mine but this would not even run so I am not overly surprised it was downvoted, you also don't link to pandas which is not builtin or explain what it is doing
@Padraic Cunningham Thank you for the constructive criticism. I corrected and extended my answer accordingly
@lanenok it is a fellow stack overflowers feeling
1

You don't need to loop over both csvreader objects. you can first use itertools.chain to concatenate 2 csvreader. Then use a dictionary (with setdefault method )to store the event_id's as the keys and polarity as the values.

import csv
from itertools import chain
d={}
with open('a1.txt', 'rb') as csvfile1,open('ex.txt', 'rb') as csvfile2:
     spamreader1 = csv.reader(csvfile1, delimiter=',')
     spamreader2 = csv.reader(csvfile2, delimiter=',')
     spamreader1.next()
     spamreader2.next()
     sp=chain(spamreader1,spamreader2)
     for i,j in sp:
            d.setdefault(i,[]).append(j)
     print d

result :

{'36794': ['0.638', '0.8'], 
 '61824': ['0.3'], 
 '1124': ['0.3763'], 
 'dthdnb': ['0.66'], 
 'dhejjd': ['0.3627', '0.9']}

Comments

0

When you loop through file2 the first time, you hit the stop iteration and the file will remain there. To read it multiple times you have to open it multiple times - but this entire process is wasteful. Assuming you can fit all of the data into memory you could just read the data into dicts:

import csv

file1 = {}

file2 = {}

with open('file1.csv', 'r') as input1:

    reader = csv.reader(input1)
    reader.next()

    for row in reader:
        file1[row[0]] = row[1]

with open('file2.csv', 'r') as input2:

    reader = csv.reader(input2)
    reader.next()

    for row in reader:
        file2[row[0]] = row[1]


# And now we can directly compare without looping through file 2 every time

for key in file1:
    # try/except is more pythonic.
    try:
        print key, file1[key], file2[key]
    except KeyError:
        pass

This saves processing time as you don't have to loop so much and stops you from having to open and close the file every time you go to the next iteration of file1.

Note: I originally used dictreader in this example - but this was based on the assumption you had multiple columns which I believe was wrong. In this case you can just use list indexing.

If you were to have multiple columns with the same name and varying order, you could use dictreader instead

If this is the case and you need to use the DictReader, the code is as follows:

import csv

file1 = {}

file2 = {}

with open('file1.csv', 'r') as input1:

    reader = csv.DictReader(input1)
    # Don't use next so we can use the headers as keys

    for row in reader:
        file1[row['event_id']] = row['polarity']

with open('file2.csv', 'r') as input2:

    reader = csv.DictReader(input2)
    # Don't use next so we can use the headers as keys

    for row in reader:
        file2[row['event_id']] = row['tallies']


# And now we can directly compare without looping through file 2 every time

for key in file1:
    # try/except is more pythonic.
    try:
        print key, file1[key], file2[key]
    except KeyError:
        pass

8 Comments

i am getting:Traceback (most recent call last): File "combined5.py", line 14, in <module> file1[row['event_id']] = row['polarity'] KeyError: 'polarity'
Check the column header names match the keys. Note - you don't really need the dictreader to do this, you can use list indexing which I've changed the answer to match.
look out for capitalisation, extra spaces etc. they all count. You can print row for an example of what the actual keys are.
my actual keys are event_id and polarity. same goes for event_id and tallies.
and if you switch file1[row['event_id']] = row['polarity'] to print row.keys() what do you get for the keys printed out? The error suggests that polarity isn't quite whats in the header row.
|
0

You can group them using a dict:

from collections import defaultdict
d = defaultdict (list)

with open("file1.csv") as f1, open("file2.csv") as f2:
    d = defaultdict(list)
    next(f1),next(f2)
    r1 = csv.reader(f1,skipinitialspace=True)
    r2 = csv.reader(f2,skipinitialspace=True)
    for row in r1:
        d[row[0]].append(float(row[1]))
    for row in r2:
        d[row[0]].append(float(row[1]))

defaultdict(<type 'list'>, {'36794': [0.638, 0.8], '61824': [0.3], '1124': [0.3763], 'dthdnb': [0.66], 'dhejjd': [0.3627, 0.9]})

from operator import mul
for k, v in filter(lambda x: len(x[1])== 2, d.items()):
    print(mul(*v))
0.5104
0.32643

If you actually have multiple spaces in your file then the csv module is not going to work which based on your ValueError is probably the case.

If your file is a mess this will work:

with open("file1.csv") as f1, open("file2.csv") as f2:
    d = defaultdict(list)
    next(f1), next(f2)
    for row in f1:
        eve, pol = row.replace(" ","").split(",")
        d[eve].append(float(pol))
    for row in f2:
        eve, tal = row.replace(" ","").split(",")
        d[eve].append(float(tal))

Input:

event_id, polarity
   1124,   0.3763
  36794,   0.638
 dhejjd,   0.3627
file2.csv

event_id, tallies
   61824,   0.3
   36794,   0.8
   dhejjd,   0.9
   dthdnb,   0.66

Output:

defaultdict(<type 'list'>, {'36794': [0.638, 0.8], '61824': [0.3], '1124': [0.3763], 'dthdnb': [0.667], 'dhejjd': [0.3627, 0.9]})

Comments

0

I'd suggest storing the data from the two files into a dictionary of dictionaries which can easily be created by using collections.defaultdict.

import csv
from collections import defaultdict
import json  # just for pretty printing resulting data structure

event_data = defaultdict(dict)

filename1 = "file1.csv"
filename2 = "file2.csv"

with open(filename1, "rb") as file1:
    file1reader = csv.reader(file1, delimiter=",", skipinitialspace=True)
    next(file1reader)  # skip over header
    for event_id, polarity in file1reader:
        event_data[event_id]['polarity'] = float(polarity)

with open(filename2, "rb") as file2:
    file2reader = csv.reader(file2, delimiter=",", skipinitialspace=True)
    next(file2reader)  # skip over header
    for event_id, tallies in file2reader:
        event_data[event_id]['tallies'] = float(tallies)

print 'event_data:', json.dumps(event_data, indent=4)
print

# print as table
for event_id in sorted(event_data):
    print 'event_id: {!r:<8} polarity: {:<8} tallies: {:<8}'.format(
        event_id,
        event_data[event_id].get('polarity', None),
        event_data[event_id].get('tallies', None))

Output:

event_data: {
    "36794": {
        "polarity": 0.638, 
        "tallies": 0.8
    }, 
    "61824": {
        "tallies": 0.3
    }, 
    "1124": {
        "polarity": 0.3763
    }, 
    "dthdnb": {
        "tallies": 0.66
    }, 
    "dhejjd": {
        "polarity": 0.3627, 
        "tallies": 0.9
    }
}

event_id: '1124'   polarity: 0.3763   tallies: None    
event_id: '36794'  polarity: 0.638    tallies: 0.8     
event_id: '61824'  polarity: None     tallies: 0.3     
event_id: 'dhejjd' polarity: 0.3627   tallies: 0.9     
event_id: 'dthdnb' polarity: None     tallies: 0.66    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.