I have a pandas dataframe where cells in columns have multiple values and are separated by ';'. I'm trying to split the multiple values (in one cell) and create new rows for those that split off. Something like the example below:
> In: df
> Out:
| Year | State | Ingredient | Species |
| 1998 | CA | egg; pork | sp1;sp2 |
The result I am trying to achieve looks like this:
> In: df
> Out:
| Year | State | Ingredient | Species |
| 1998 | CA | egg | sp1 |
| 1998 | CA | egg | sp1 |
| 1998 | CA | pork | sp2 |
| 1998 | CA | pork | sp2 |
I have found a method to split the dataframe like this, but it only works once. The code I used is shown below:
sp = df['Species'].str.split(';', expand=True).stack().reset_index(level=1, drop=True)
i = sp.index.get_level_values(0)
df1 = df.loc[i].copy()
df1['Species] = sp.values
When I execute this on the 'Species' column first, using the original dataframe (df), it works.
However, when I execute this code again on df1, trying to split up all the 'Ingredient', it gives me an error saying that length of value does not match length of index. As shown below:
fd = df1['Ingredient'].str.split(';', expand=True).stack().reset_index(level=1, drop=True)
j = fd.index.get_level_values(0)
df2 = df1.loc[j].copy()
df2['Ingredient'] = fd.values
I did many trials to find why it returns that error message to me, and I realized that when I execute this called again on df1 to create df2, it doubles the number of rows/index when I execute df2 = df1.loc[j].copy(). Therefore, giving me more rows than I need. However, if I substitute 'df1' with 'df' (the original dataframe) then this error doesn't appear and it works.
Is there a solution to fix this? Or is there any other way of splitting it?
Thank you.
ps. This is my first time posting on Stack Overflow, and I'm also new to Python. Sorry if the formatting is bad.