Get values from corresponding row CSV Python

Question

I have multiple csv files like this:

csv1:

h1,h2,h3
aa,34,bd9
bb,459,jg0

csv2:

h1,h5,h2
aa,rg,87
aa,gru,90
bb,sf,459

For each value in column 0 with header h1, I'd like to get its corresponding h2 values from all the csv files in a folder. A sample output could be

csv1: (aa,34),(bb,459)
csv2: (aa,87,90),(bb,459)

I'm a little clueless on how to go about doing this.

PS- I don't want to use pandas.

PPS- I'm able to do it by hardcoding the value from column 0, but I don't want to do it that way since there are hundreds of rows.

This is a small piece of code I've tried. It prints the values of h2 for 'aa' in different lines. I want them to be printed in the same line.

import csv
with open("test1/sample.csv") as csvfile:
     reader = csv.DictReader(csvfile,  delimiter = ",")
     for row in reader:
         print(row['h1'], row['h2'])

have you looked at docs.python.org/2/library/csv.html ?

Bruce
– Bruce

2015-02-05 17:12:00 +00:00
Commented Feb 5, 2015 at 17:12 — Bruce
– Bruce, Commented Feb 5, 2015 at 17:12

Padraic Cunningham · Accepted Answer · 2015-02-05 18:06:36Z

3

import glob
import csv
import os
from collections import defaultdict
d = defaultdict(list)
path = "path_to_folder"
for fle in (glob.glob("*.csv")):
    with open(os.path.join(path,fle)) as f:
        header = next(f).rstrip().split(",")
        # if either does not appear in header the value will be None
        h1 = next((i for i, x in enumerate(header) if x == "h1"),None)
        h2 = next((i for i, x in enumerate(header) if x == "h2"),None)
        # make sure we have both columns before going further
        if h1 is not None and h2 is not None:
            r = csv.reader(f,delimiter=",")
            # save file name as key appending each h1 and h2 value
            for row in r:
                d[fle].append([row[h1],row[h2]])
print(d)

defaultdict(<class 'list'>, {'csv1.csv': [['aa', '34'], ['bb', '459']], 'csv2.csv': [['aa', '87'], ['aa', '90'], ['bb', '459']]})

It is a quick draft, it presumes all files are delimited by , and all h1 and h2 columns have values, if so it will find all pairings keeping order.

To get a set of unique values we can use a set and set.update:

d = defaultdict(set) # change to set

for fle in (glob.glob("*.csv")):
    with open(os.path.join(path,fle)) as f:
        header = next(f).rstrip().split(",")
        h1 = next((i for i, x in enumerate(header) if x == "h1"),None)
        h2 = next((i for i, x in enumerate(header) if x == "h2"),None)
        if h1 is not None and h2 is not None:
            r = csv.reader(f,delimiter=",")
            for row in r:
                d[fle].update([row[h1],row[h2]) # set.update

print(d)
defaultdict(<class 'set'>, {'csv1.csv': {'459', '34', 'bb', 'aa'}, 'csv2.csv': {'459', '90', '87', 'bb', 'aa'}})

If you are sure you always have h1 and h2 you can reduce the code to simply:

d = defaultdict(set)
path = "path/"
for fle in (glob.glob("*.csv")):
    with open(os.path.join(path, fle)) as f:
        r = csv.reader(f,delimiter=",")
        header = next(r)
        h1 = header.index("h1")
        h2 = header.index("h2")
        for row in r:
            d[fle].update([row[h1], row[h2]])

lastly if you want to keep the order the elements are found we cannot use a set as they are unordered so we would need to check if either element was already in the list:

for fle in (glob.glob("*.csv")):
    with open(os.path.join(path, fle)) as f:
        r = csv.reader(f,delimiter=",")
        header = next(r)
        h1 = header.index("h1")
        h2 = header.index("h2")
        for row in r:
            h_1, h_2 = row[h1], row[h2]
            if h_1 not in d[fle]:
                d[fle].append(h_1)
            if h_2 not in d[fle]:
                d[fle].append(h_2)
print(d)
defaultdict(<class 'list'>, {'csv2.csv': ['aa', '87', '90', 'bb', '459'], 'csv1.csv': ['aa', '34', 'bb', '459']})

edited Feb 5, 2015 at 18:06

answered Feb 5, 2015 at 17:24

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

abn Over a year ago

But is there a way I can merge ['aa', '87'], ['aa', '90'] into ['aa', '87', '90']?

Padraic Cunningham Over a year ago

@dan, no worries, I threw it together pretty quick so no doubt can be optimised.

Padraic Cunningham Over a year ago

@dan yes indeed we can use extend or uses sets I will edit after I grab some dinner

abn Over a year ago

Sure. I'm sorry if I stopped you. Thank you.

Padraic Cunningham Over a year ago

No prob, if you are sure there are always going to be h1 and h2's you can use index to get the column index for each and remove the if check

Collectives™ on Stack Overflow

Get values from corresponding row CSV Python

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related