3

I have a Pandas DataFrame with the following structure, which contains both numbers and numpy arrays of fixed shape:

import pandas as pd
import numpy as np

df = pd.DataFrame({"num":(23, 42), "list":(np.arange(3), np.arange(1,4))

Assuming I have large (more than 1 GB) amounts of this data that I would like to store and retrieve quickly, how should I go about storing it? If I use HDF5, the Numpy array gets pickled which will affect the ability to retrieve the data quickly. Is there some way to tell HDF5 how to store Numpy arrays? Alternatively, should I not be using HDF5 at all?

The following GitHub thread seems to suggest the following:

  1. Create a function that gets the desired Numpy array, which is stored in some other format [1]
  2. Create a class to inform HDF5 [2]

Both of these solutions seem oddly specific for how common I imagine this problem to be. Are there more general approaches? Am I just using the wrong tool?

5
  • are all the array have the same shape? Commented Oct 1, 2016 at 2:34
  • Yes. I will add that information to my question. Commented Oct 1, 2016 at 2:35
  • Then you can convert the arrays to columns. Commented Oct 1, 2016 at 2:40
  • But that results in the redundant storage of the num field. Is it silly to try to avoid that? Commented Oct 1, 2016 at 2:45
  • It isn't silly to avoid that if it was a problem. The way this is arranged, you won't have extra rows of num because the arrays expand within the same row. @HYRY is right, expand the arrays, then store. Collapsing them back after retrieval. Commented Oct 1, 2016 at 7:38

1 Answer 1

3

I mean something like this:

df_x = pd.concat([df.num, pd.DataFrame(np.vstack(df.list))], 
                 keys=["key", "arr"], axis=1)

the dataframe:

  key arr      
  num   0  1  2
0  23   0  1  2
1  42   1  2  3

convert back with:

pd.concat([df_x.key, pd.Series(tuple(df_x.arr.values), name='list')], axis=1)

   num       list
0   23  [0, 1, 2]
1   42  [1, 2, 3]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.