Here is my starting df:
import numpy as np
import pandas as pd
df = pd.DataFrame(['alpha', 'beta'], columns = ['text'])
df
text
0 alpha
1 beta
Here is the end result I want:
text first second third
0 alpha alpha-first alpha-second alpha-third
1 beta beta-first beta-second beta-third
I have written the custom function parse(), no issue there:
def parse(text):
return [text + ' first', text + ' second', text + ' third']
Now I try to apply parse() to the initial df, which is where errors arise:
1) If I try the following:
df = df.reindex(columns = list(df.columns) + ['first', 'second', 'third']) # Create empty columns
df[['first', 'second', 'third']] = df.text.apply(parse)
I get:
ValueError: Must have equal len keys and value when setting with an ndarray
2) Slightly different version:
df = df.reindex(columns = list(df.columns) + ['first', 'second', 'third']).astype(object) # Create empty columns of "object" type
df[['first', 'second', 'third']] = df.text.apply(parse)
I get:
ValueError: shape mismatch: value array of shape (2,) could not be broadcast
to indexing result of shape (3,2)
Where am I going wrong?
EDIT:
I should clarify that parse() itself is a much more complicated function in the real-world problem I'm trying to solve. (it takes a paragraph, finds 3 specific types of strings in it, and outputs those strings as a list of length 3). In my code above, I made up a somewhat random simple definition of parse() as a substitute to avoid getting bogged down in details unrelated to the two errors I'm getting.