6

I have two dataframes, both indexed by a date column called month. The first, df1, has eight rows. The column I care about is df['num_percent'] and it looks like this:

2015-02-01    0.071549
2015-03-01    0.070368
2015-04-01    0.069291
2015-05-01    0.068394
2015-06-01    0.067452
2015-07-01    0.066302
2015-08-01    0.065543
2015-09-01    0.064591
Name: num_percent, dtype: float64

The second dataframe has 100,000 rows. The column I care about is df2['total_quantity'] and a sample of it looks like this:

2014-11-01    324199
2014-12-01    378443
2015-01-01    367379
2015-02-01    336863
2015-03-01    380268
2015-04-01    386292
2015-05-01    373213
2015-06-01    403343
2015-07-01    414310
2015-08-01    403684
2015-09-01    420922
Name: total_quantity, dtype: int64

I want to add a new column to df2 which is the value of df2['total_quantity'] multiplied by the corresponding value for the month in df1.

How can I do this?

If I try:

df2['percent'] = df2['total_quantity'] * df1['num_percent']

I get ValueError: cannot reindex from a duplicate axis.

UPDATE: Here's some data and code to replicate the problem:

data = {'month': ['2014-01-01', '2014-02-01', '2014-03-01'],
        'num_percent': [0.4, 0.5, 0.6]}
df1 = pd.DataFrame(data)
df1['month'] = pd.to_datetime(df1['month'])
df1 = df1.set_index('month')

data = {'month': ['2014-01-01', '2014-02-01', '2014-03-01', '2014-01-01'],
        'org': ['00K', '00K', '00K', '00L'],
        'total_quantity': [1000, 1000, 2000, 1000]}
df2 = pd.DataFrame(data)
df2['month'] = pd.to_datetime(df2['month'])
df2 = df2.set_index('month')

# Both of these produce ValueError: cannot reindex... 
df2['percent'] = df1['num_percent'] * df2['total_quantity']
df2.loc[df2.index.isin(df1.index), 'percent'] = df2['total_quantity'] * df1['num_percent']
4
  • 2
    Can you post code and data to reproduce your error, this should just work as it will produce NaN where the indices don't align Commented Jan 18, 2016 at 11:25
  • @EdChum sorry about that, have added some. Commented Jan 18, 2016 at 11:44
  • So what is the desired output here? you have duplicate values in your df2.index hence the error, when you have duplicate index values are your row values also duplicated? Commented Jan 18, 2016 at 11:46
  • I've posted a method, basically you can join the dfs and then multiply the columns Commented Jan 18, 2016 at 11:48

1 Answer 1

5

If you join the dfs first then you can then multiply:

In [24]:
df3 = df1.join(df2)
df3['percent'] = df3['num_percent'] * df3['total_quantity']
df3

Out[24]:
            num_percent  org  total_quantity  percent
month                                                
2014-01-01          0.4  00K            1000      400
2014-01-01          0.4  00L            1000      400
2014-02-01          0.5  00K            1000      500
2014-03-01          0.6  00K            2000     1200
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.