0

I'm completely new to regex and i'm facing this challenge that is taking me hours to solve. I have the following dataframe with a string column "Dimensions":

df
                 Dimensions
0                           "Width:2 cm"
1      "Diameter: 1.2 cm, Height: 10 cm"
2        "Diameter: 3.4cm, Volume: 10cm"

I'm looking for a way to "split" the dimensions column into multiple columns :

desired_df

             Width      Diameter      Height    Volume
0              2          null         null       null
1             null         1.2          10        null
2             null         3.4         null        10

Note that the columns order don't matter. If you have any hints that could help me I would appreciate it, thanks !

1
  • 1
    Show your own effort to solve the problem. There is regex101.com to play with regular expressions (set to Python flavor). Commented Nov 15, 2022 at 19:43

1 Answer 1

4

Try:

x = df["Dimensions"].str.extractall(r'([^\s"]+)\s*:\s*(\d+\.?\d*)').droplevel(1)
x = x.pivot(columns=0, values=1)
x.columns.name = None
print(x)

Prints:

  Diameter Height Volume Width
0      NaN    NaN    NaN     2
1      1.2     10    NaN   NaN
2      3.4    NaN     10   NaN
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.