1

This question has a big chance to be duplicated but I haven't found an answer yet. However, I'm trying to apply a function to a pandas DataFrame and I want to have a DataFrame back. Followed example is reproducible:

df = pd.DataFrame({'ID': ["1","2"],
                   'Start': datetime.strptime('20160701', '%Y%m%d'),
                   'End': datetime.strptime('20170701', '%Y%m%d'),
                   'Value': [100, 200],
                   'CreditNote': [-20, -30]})

My function:

def act_value_calc(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
    result2 = round( (x.Value + x.CreditNote) - result1, 2)
    return(pd.DataFrame({'r1': [result1],'r2': [result2]}))

Why I can not run the following code ...

df.apply(act_value_calc, 1)

and what should be done to let it run? I mean to get a DataFrame or a list back with result1 and result2?

3 Answers 3

1

you can create a global variable by declaring it within the function and then create a data frame out of it

def act_value_calc(x): 
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
result2 = round( (x.Value + x.CreditNote) - result1, 2)
global  df ### declaring global variable
df=pd.DataFrame({'r1': [result1],'r2': [result2]})
Sign up to request clarification or add additional context in comments.

Comments

0

You can make it easier for yourself while returning a pandas.Series instead of a pandas.DataFrame:

def act_value_calc(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
    result2 = round( (x.Value + x.CreditNote) - result1, 2)
    return(pd.Series({'r1': result1,'r2': result2}))

print(df.apply(act_value_calc, 1))
    r1      r2
0   40.11   39.89
1   85.23   84.77

1 Comment

Ahh yeah that's solid too.
0

apply will return some value per row, or per column, depending on the axis argument you provide (I believe you understand this already given you are providing an axis arg of 1).

Returning a DataFrame from apply is problematic. What you probably want to do is create a new column with the values returned by the function you are applying.

Something like

def act_value_calc1(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
    return result1

def act_value_calc2(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result2 = round( (x.Value + x.CreditNote) - x.result1, 2)
    return result2

df['result1'] = df.apply(act_value_calc1, axis=1)
df['result2'] = df.apply(act_value_calc2, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.