2

I am trying to merge two MultiIndex'ed dataframes. My code is below. The issue, as you can see in the output, is that the "DATE" index is repeated, whereas I'd like all the values (OPEN_INT, PX_LAST) to be on the same date index... any ideas? I've tried both append, and concat but both give me similar results.

            if df.empty:
                df = bbg_historicaldata(t, f, startDate, endDate)
                datesArray = list(df.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])                    
                df = pd.DataFrame({f : df[f].values}, index=index)

            else:
                temp = bbg_historicaldata(t,f,startDate,endDate)
                datesArray = list(temp.index)
                tArray = [t for i in range(len(datesArray))]
                arrays = [tArray, datesArray]
                tuples = list(zip(*arrays))
                index = pd.MultiIndex.from_tuples(tuples, names=['TICKER', 'DATE'])


                temp = pd.DataFrame({f : temp[f].values}, index=index)

                #df = df.append(temp, ignore_index = True)
                df = pd.concat([df, temp]).sortlevel()

And the results:

                        OPEN_INT  PX_LAST
TICKER      DATE                          
EDH8 COMDTY 2017-02-01        NaN   98.365
            2017-02-01  1008044.0      NaN
            2017-02-02        NaN   98.370
            2017-02-02  1009994.0      NaN
            2017-02-03        NaN   98.360
            2017-02-03  1019181.0      NaN
            2017-02-06        NaN   98.405
            2017-02-06  1023863.0      NaN
            2017-02-07        NaN   98.410
            2017-02-07  1024609.0      NaN
            2017-02-08        NaN   98.435
            2017-02-08  1046258.0      NaN
            2017-02-09        NaN   98.395

Essentially want to get it so there are no NaNs!

EDIT: Adding "axis = 1" to the concat resulted in the following (my fault for not including the additional output in the firstplace)

                        PX_LAST   OPEN_INT  PX_LAST  OPEN_INT  PX_LAST  \
TICKER      DATE                                                         
EDH8 COMDTY 2017-02-01   98.365  1008044.0      NaN       NaN      NaN   
            2017-02-02   98.370  1009994.0      NaN       NaN      NaN   
            2017-02-03   98.360  1019181.0      NaN       NaN      NaN   
            2017-02-06   98.405  1023863.0      NaN       NaN      NaN   
            2017-02-07   98.410  1024609.0      NaN       NaN      NaN   
            2017-02-08   98.435  1046258.0      NaN       NaN      NaN   
            2017-02-09   98.395  1050291.0      NaN       NaN      NaN   
EDM8 COMDTY 2017-02-01      NaN        NaN   98.245  726739.0      NaN   
            2017-02-02      NaN        NaN   98.250  715081.0      NaN   
            2017-02-03      NaN        NaN   98.235  723936.0      NaN   
            2017-02-06      NaN        NaN   98.285  729324.0      NaN   
            2017-02-07      NaN        NaN   98.295  728673.0      NaN   
            2017-02-08      NaN        NaN   98.325  728520.0      NaN   
            2017-02-09      NaN        NaN   98.280  741840.0      NaN   
EDU8 COMDTY 2017-02-01      NaN        NaN      NaN       NaN   98.130   
            2017-02-02      NaN        NaN      NaN       NaN   98.135   
            2017-02-03      NaN        NaN      NaN       NaN   98.120   
            2017-02-06      NaN        NaN      NaN       NaN   98.180   
            2017-02-07      NaN        NaN      NaN       NaN   98.190   
            2017-02-08      NaN        NaN      NaN       NaN   98.225   
            2017-02-09      NaN        NaN      NaN       NaN   98.175  

Thanks!

2 Answers 2

1

It is not clear what the format of input is.

I assumed OPEN_INT looks like this:

import datetime
import pandas as pd


open_int = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 1008044.0),
        (datetime.date(2017, 2, 2), 1009994.0),
        (datetime.date(2017, 2, 3), 1019181.0),
        (datetime.date(2017, 2, 6), 1023863.0),
        (datetime.date(2017, 2, 7), 1024609.0),
        (datetime.date(2017, 2, 8), 1046258.0),
    ],
    columns=['DATE', 'OPEN_INT']
)
open_int['TICKER'] = 'EDH8 COMDTY'
open_int.set_index(['TICKER', 'DATE'], inplace=True)

print(open_int)
#                          OPEN_INT
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0
#             2017-02-02  1009994.0
#             2017-02-03  1019181.0
#             2017-02-06  1023863.0
#             2017-02-07  1024609.0
#             2017-02-08  1046258.0

And PX_LAST looks like this:

px_last = pd.DataFrame(
    [
        (datetime.date(2017, 2, 1), 98.365),
        (datetime.date(2017, 2, 2), 98.370),
        (datetime.date(2017, 2, 3), 98.360),
        (datetime.date(2017, 2, 6), 98.405),
        (datetime.date(2017, 2, 7), 98.410),
        (datetime.date(2017, 2, 8), 98.435),
        (datetime.date(2017, 2, 9), 98.395),

    ],
    columns=['DATE', 'PX_LAST']
)
px_last['TICKER'] = 'EDH8 COMDTY'
px_last.set_index(['TICKER', 'DATE'], inplace=True)

print(px_last)
#                         PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01   98.365
#             2017-02-02   98.370
#             2017-02-03   98.360
#             2017-02-06   98.405
#             2017-02-07   98.410
#             2017-02-08   98.435
#             2017-02-09   98.395

Then you concat them and get what you want:

df = pd.concat([open_int, px_last], axis=1)
print(df)
#                          OPEN_INT  PX_LAST
# TICKER      DATE
# EDH8 COMDTY 2017-02-01  1008044.0   98.365
#             2017-02-02  1009994.0   98.370
#             2017-02-03  1019181.0   98.360
#             2017-02-06  1023863.0   98.405
#             2017-02-07  1024609.0   98.410
#             2017-02-08  1046258.0   98.435
#             2017-02-09        NaN   98.395
Sign up to request clarification or add additional context in comments.

1 Comment

hi - thanks for your response. This caused another issue unfort. Edit above
1

You need to concatenate along the other axis:

pd.concat([df, temp], axis=1)

By default, Pandas concatenates rows and aligns columns, which lead to the result you saw.

1 Comment

hi - thanks for your response. This caused another issue unfort. Edit above

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.