0

I have a pandas column which has data like this :

**Title **: New_ind

**Body **: Detection_error

*respo_URL **: www.github.com

**respo_status **: {color}

data = {'sl no': [661, 662],
        'key': ['3484', '3483'],
        'id': [13592349, 13592490],
        'Sum': ['[E-1]', '[E-1]'],
        'Desc': [
              "**Title **: New_ind\n\n**Body **: Detection_error\n\n*respo_URL **: www.github.com\n\n**respo_status **: {yellow}","**Title **: New_ind2\n\n**Body **: import_error\n\n*respo_URL **: \n\n**respo_status **: {green}"]}

df = pd.DataFrame(data)

I need to generate new columns where Title, Body, response_URL, etc would be column names and everything after : should be the value contained in those column cells. Just to mention the items in the column are not dictionaries

2
  • @Timus They are actually not dictionary which is why I am having a problem Commented Jan 30, 2023 at 16:18
  • @Timus I have added two rows of the data here, Please let me know if this is sufficient. Thanks Commented Jan 31, 2023 at 4:11

1 Answer 1

1

There are various ways to do that with regex but I found this with str-methods to be the clearest:

desc_df = df["Desc"].str.split("\n\n", expand=True)
for col in desc_df.columns:
    desc_df[col] = desc_df[col].str.split(":").str[1].str.strip()
colnames = "Title", "Body", "respo_URL", "respo_status"
desc_df = desc_df.rename(columns=dict(enumerate(colnames)))
df = pd.concat([df.drop(columns="Desc"), desc_df], axis=1)
  • First split column Desc at \n\n and expand the result into a dataframe desc_df.
  • Then split each new column at :, take the right side, and strip whitespace.
  • Finally change the column names and concat the initial dataframe without the Desc column and desc_df.

Result for the sample:

   sl no   key        id    Sum     Title             Body       respo_URL  \
0    661  3484  13592349  [E-1]   New_ind  Detection_error  www.github.com   
1    662  3483  13592490  [E-1]  New_ind2     import_error                   

  respo_status  
0     {yellow}  
1      {green}

The following regex-version worked for the sample, but I think it's not as robust the other one:

pattern = "\n\n".join(
    f"\*+{col} \*+: (?P<{col}>[^\n]*)"
    for col in ("Title", "Body", "respo_URL", "respo_status")    
)
desc_df = df["Desc"].str.extract(pattern)
df = pd.concat([df.drop(columns="Desc"), desc_df], axis=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.