2

I have a CSV file with N key columns, and a column with expressions containing references to 1 to N of the key columns which I want replaced with the values in each key column for that row. Hopefully the example below clarifies what I mean.

Key columns below are A,B,C

Table

Desired output:

20_A
20_B
30_A
30_B
40_C_4
40_C_5

My solution:

keys = ['Age','Type','Delay']
df = pd.read_csv(csv_path)
for index, row in df.iterrows():

    key1_list = row[keys[0]].split(",")
    key2_list = row[keys[1]].split(",")
    key3_list = row[keys[2]].split(",")

    expression = row['Expression']

    # Iterate over all combinations of key column values and export a chart for each one
    for KEY1 in key1_list:
        for KEY2 in key2_list:
            for KEY3 in key3_list:
                string = expression
                string = string.replace("<" + keys[0] + ">", KEY1)
                string = string.replace("<" + keys[1] + ">", KEY2)
                string = string.replace("<" + keys[2] + ">", KEY3)
                print(string)

However I would like to generalize my code to work for any number of key columns and only needing to update the keys list at the beginning. This would require looping to depth len(keys). But I can't figure out how to generalize looping to any depth with flat code, I looked at itertools but couldn't find what I need. I think recursion might work but I prefer to avoid that.

0

1 Answer 1

2

Recursion could of course fix the problem for you, but you should take another look in itertools before going down that path. What you want is a product of your keys, to generate all possible combinations of keys.

One way of achieving this is as follows:

import pandas as pd
import itertools

csv_path = "path/to/file"
df = pd.read_csv(csv_path)

# Find available keys from data frame instead of manually input it:
keys = list(df.keys()[:-1]) # Do not include "Expression" as it is not a key.
for index, row in df.iterrows():

    # Add list of keys to a list of lists
    # (The order needs to be preserved, therefore avoiding dict)
    key_list = []
    for key in keys:
        # The code uses ',' as value separator in each cell.
        # Does this work in a csv file?
        key_list.append(list(row[key].split(',')))

    expression = row['Expression']

    # All key combinations are then generated with 'itertools.product'
    combos = itertools.product(*key_list)

    # Each combo is then handled separately
    for combo in combos:
        string = expression
        # Replace each key in order
        # Must be done sequentially since depth is not known/variable
        for key, value in zip(keys, combo):
            string = string.replace('<' + key + '>', value)
        print(string)

Hopefully this code is understandable and does what you want it to. Otherwise please let me know and I'll try to clarify further.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much! I revisited itertools.product after I posted, and your answer explains how to use it very well. I was using enumerate on combos and accessing the key name using the index on the keys list, but using zip seems more pythonic. Also good point about using commas as separators in a CSV maybe causing issues, but it worked so I guess it is escaped and handled by the pandas import

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.