10

I have a dict containing several pandas Dataframe (identified by keys) , any suggestion to effectively serialize (and cleanly load) it . Here is the structure (a pprint display output ). Each of dict['method_x_']['meas_x_'] is a pandas Dataframe. The goal is to save the dataframes for a further plotting with some specific plotting options.

{'method1':

{'meas1':

                          config1   config2
                   0      0.193647  0.204673
                   1      0.251833  0.284560
                   2      0.227573  0.220327,
'meas2':   
                          config1   config2
                   0      0.172787  0.147287
                   1      0.061560  0.094000
                   2      0.045133  0.034760,

'method2':

{ 'meas1':

                          congif1   config2
                   0      0.193647  0.204673
                   1      0.251833  0.284560
                   2      0.227573  0.220327,

'meas2':

                          config1   config2
                   0      0.172787  0.147287
                   1      0.061560  0.094000
                   2      0.045133  0.034760}}

2 Answers 2

8

Use pickle.dump(s) and pickle.load(s). It actually works. Pandas DataFrames also have their own method df.save("filename") that you can use to serialize a single DataFrame...

Sign up to request clarification or add additional context in comments.

Comments

2

In my particular use case, I tried to do a simple pickle.dump(all_df, open("all_df.p","wb"))

And while it loaded properly with> all_df = pickle.load(open("all_df.p","rb"))

When I restarted my Jupiter enviroment I would get a UnpicklingError: invalid load key, '\xef'.

One of the methods described here state that we can use HDF5 (pytables) to do the job. From their docs:

HDFStore is a dict-like object which reads and writes pandas

But it seems to be picky about the tablesversion that you use. I got mine to work after a pip install --upgrade tables and doing a runtime restart.

If you need a overall idea on how to use it:

#consider all_df as a list of dataframes
with pd.HDFStore('df_store.h5') as df_store:
    for i in all_df.keys():
        df_store[i] = all_df[i]

You should have a df_store.h5 file that you can convert back using the reverse process:

new_all_df = dict()
with pd.HDFStore('df_store.h5') as df_store:
    for i in df_store.keys():
        new_all_df[i] = df_store[i]

1 Comment

Thanks for that. The h5py.org documentation is a nightmare but perhaps that's simply because hdf5 is a nightmare.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.