2

My question is more about the methodology/syntax described into a previous post which addresses different approaches to meet the same objective of splitting string values into lists and assigning each list item to a new column. Here's the post: Pandas DataFrame, how do i split a column into two

df:

                          GDP
Date                        
Mar 31, 2017  19.03 trillion
Dec 31, 2016  18.87 trillion

script 1 + ouput:

>>> df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1).str
>>> print(df)

                GDP     Units
Date                         
Mar 31, 2017  19.03  trillion
Dec 31, 2016  18.87  trillion

script 2 + output:

>>> df[['GDP', 'Units']] = df['GDP'].str.split(' ', 1, expand=True)
>>> print(df)

                GDP     Units
Date                         
Mar 31, 2017  19.03  trillion
Dec 31, 2016  18.87  trillion

script 3 + output:

>>> df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1, expand=True)
>>> print(df)

              GDP  Units
Date                    
Mar 31, 2017    0      1
Dec 31, 2016    0      1

Can anyone explain what is going on? Why does script 3 produce these values in the output?

1 Answer 1

5

Let's start by looking at this

df['GDP'].str.split(' ', 1)

0    [19.03, trillion]
1    [18.87, trillion]
Name: GDP, dtype: object

It produces a series of lists. However, the pd.Series.str, aka string accessor allows us to access the first, second, ... parts of these embedded lists via intuitive python list indexing.

df['GDP'].str.split(' ', 1).str[0]

Date
Mar 31, 2017    19.03
Dec 31, 2016    18.87
Name: GDP, dtype: object

Or

df['GDP'].str.split(' ', 1).str[1]

Date
Mar 31, 2017    trillion
Dec 31, 2016    trillion
Name: GDP, dtype: object

So, if we split into two element lists, split(' ', 1) we can treat the return object from an additional str as an iterable

a, b = df['GDP'].str.split(' ', 1).str

a

Date
Mar 31, 2017    19.03
Dec 31, 2016    18.87
Name: GDP, dtype: object

And

b

Date
Mar 31, 2017    trillion
Dec 31, 2016    trillion
Name: GDP, dtype: object

Ok, we can short-cut the creation of two new columns by leveraging this iterable unpacking

df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1).str

However, we can pass a parameter to expand our new lists into new dataframe columns

df['GDP'].str.split(' ', 1, expand=True)

                  0         1
Date                         
Mar 31, 2017  19.03  trillion
Dec 31, 2016  18.87  trillion

Now we can assign a dataframe to new columns of another dataframe like so

df[['GDP', 'Units']] = df['GDP'].str.split(' ', 1, expand=True)

However, when we do

df['GDP'], df['Units'] = df['GDP'].str.split(' ', 1, expand=True)

The return value of df['GDP'].str.split(' ', 1, expand=True) gets unpacked and those results are simply the column values. If you see just above, you notice they are 0 and 1. So in this case, 0 is assigned to the column df['GDP'] and 1 is assigned to the column df['Units']

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.