-1

I have to create a Pandas dataframe from a csv file using a pipeline. The src csv file may contain any number of columns with header/name containing the string 'SLA'. Sample data below: enter image description here

While creating the pandas pipeline I have to extract and store only the string before the first delimeter ('|') for all the SLA columns. For example for ID=1 the SLA1 in csv contains the value '24h|0h|13h' and I will have to store only the 24h in the dataframe (similarly for other SLA columns)

My code is as follows:

import pandas as pd


def get_sla_cols(df):
    return [col for col in df.columns if 'SLA' in col]


def split(df, cols, split_str):
    for col in cols:
        df[col] = df[col].str.split(split_str, expand=True, n=1)[0]
    return df


csv_path = r"C:\Users\daryl\Downloads\svc.csv"
svc_df = (pd.read_csv(csv_path)
          .pipe(split, lambda x: x.pipe(get_sla_cols), '|'))
 

I'm getting the below error: enter image description here

But if I run:

print(pd.read_csv(csv_path).pipe(lambda x: x.pipe(get_sla_cols)))

I'm getting the below output as expected:

enter image description here

As the code lambda x: x.pipe(get_sla_cols) is generating the list of column names why the function split(df, cols, split_str) throws error that it cannot iterate over the list of columns in the for loop? (refer to the error screenshot).

Note: If I replace lambda x: x.pipe(get_sla_cols) with hardcoded list say ['SLA1', 'SLA2', 'SLA3', 'SLA4', 'SLA5'] the code (split() function) throws no error and working as expected.

5
  • This is because you put function in the cols parameter. cols should do the for col in cols: loop, but function shouldn't. Also, if you put a list(['SLA1', 'SLA2', 'SLA3', 'SLA4', 'SLA5']), you can of course do the loop. Commented Jun 25, 2024 at 13:49
  • @PandaKim - I want to dynamically create the list of SLA columns inside the pipeline. If it is possible can you kindly help? Commented Jun 25, 2024 at 15:11
  • don't post a screenshot of some spreedsheet program. CSV is text. If you have a CSV, provide an example as text Commented Jun 25, 2024 at 16:42
  • Also, don't post screenshots of error messages. Post error messages as formatted text in the question itself Commented Jun 25, 2024 at 16:43
  • always put code, data and full error message as text (not screenshot, not link) in question (not in comment). It will be more readable and easier to use in answer (simpler to select and copy), and more people will see it - so more people can help you. Commented Jun 25, 2024 at 21:45

1 Answer 1

0

this should work then :

svc_df = (pd.read_csv(csv_path)
          .pipe(lambda df: split(df, get_sla_cols(df), '|')))

Using a lambda function for the whole pipe.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.