Input and output numpy arrays to h5py

Question

I have a Python code whose output is a enter image description here sized matrix, whose entries are all of the type float. If I save it with the extension .dat the file size is of the order of 500 MB. I read that using h5py reduces the file size considerably. So, let's say I have the 2D numpy array named A. How do I save it to an h5py file? Also, how do I read the same file and put it as a numpy array in a different code, as I need to do manipulations with the array?

@jorgeca: for that I just do np.savetxt("output.dat",A,'%10.8e') — lovespeed
– lovespeed, Commented Jan 5, 2014 at 1:22
Thanks (the extension alone doesn't mean much, it could be stored as binary, ascii...). Unless you need the extra features of hdf5, I'd just use np.save('output.dat', A) which will save it in a binary format (much faster, much less space used). — jorgeca
– jorgeca, Commented Jan 5, 2014 at 1:52
@jorgeca but will another python script be able to read it as a 2D array when I call it as A = np.loadtxt('output.dat',unpack=True) — lovespeed
– lovespeed, Commented Jan 5, 2014 at 1:57
so h5py doesn't create files smaller than those np.save would? is h5py faster than np.save for arrays of the size given in the question? — abcd
– abcd, Commented Apr 13, 2015 at 23:48

gkcn · Accepted Answer · 2016-01-26 17:00:13Z

156

h5py provides a model of datasets and groups. The former is basically arrays and the latter you can think of as directories. Each is named. You should look at the documentation for the API and examples:

http://docs.h5py.org/en/latest/quick.html

A simple example where you are creating all of the data upfront and just want to save it to an hdf5 file would look something like:

In [1]: import numpy as np
In [2]: import h5py
In [3]: a = np.random.random(size=(100,20))
In [4]: h5f = h5py.File('data.h5', 'w')
In [5]: h5f.create_dataset('dataset_1', data=a)
Out[5]: <HDF5 dataset "dataset_1": shape (100, 20), type "<f8">

In [6]: h5f.close()

You can then load that data back in using: '

In [10]: h5f = h5py.File('data.h5','r')
In [11]: b = h5f['dataset_1'][:]
In [12]: h5f.close()

In [13]: np.allclose(a,b)
Out[13]: True

Definitely check out the docs:

http://docs.h5py.org

Writing to hdf5 file depends either on h5py or pytables (each has a different python API that sits on top of the hdf5 file specification). You should also take a look at other simple binary formats provided by numpy natively such as np.save, np.savez etc:

http://docs.scipy.org/doc/numpy/reference/routines.io.html

edited Jan 26, 2016 at 17:00

gkcn

1,4501 gold badge13 silver badges23 bronze badges

answered Jan 5, 2014 at 20:27

JoshAdel

69.1k27 gold badges146 silver badges146 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

NoDataDumpNoContribution Over a year ago

Btw. if you don't know the name of the dataset beforehand while reading you have to parse the hdf file similar to here.

Irtaza Over a year ago

@JoshAdel if I want to add a column to the dataset. my dataset is a multidimensional np.array indexed as [img_id,rows,colums,channels]. and I have saved it using the method described in your answer. I access all the points in the dataset using h5f['dataset_1'][img_id]. what I want is a way to add another column say 'mycolumn' ...corresponding to every img_id in dataset. how should I add another column to this so I can do h5f['mycolumn'][img_id] ?

Martin Thoma Over a year ago

If I write matrices like this, then I cannot see them with HDFView 2.11 - I can open the file, I can see that the dataset data.h5 exists, but I cannot view it with HDFView. I can read the contents with h5py, but not inspect it with HDFView. Any idea why?

Community · Accepted Answer · 2017-05-23 12:26:20Z

139

A cleaner way to handle file open/close and avoid memory leaks:

Prep:

import numpy as np
import h5py

data_to_write = np.random.random(size=(100,20)) # or some such

Write:

with h5py.File('name-of-file.h5', 'w') as hf:
    hf.create_dataset("name-of-dataset",  data=data_to_write)

Read:

with h5py.File('name-of-file.h5', 'r') as hf:
    data = hf['name-of-dataset'][:]

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered Jan 26, 2017 at 20:47

Lavi Avigdor

4,1823 gold badges28 silver badges28 bronze badges

5 Comments

daviddesancho Over a year ago

No need to close file?

Leonid Over a year ago

@DrDeSancho no, the with statement

Andre Holzner Over a year ago

especially useful when running in interactive mode (because otherwise one risks to get an exception from h5py about an already open file when one reruns the same code without properly closing in the first attempt)

moo Over a year ago

The with feature of Python is known as the context manager. It will make sure the file is closed after it has been used. More information is available in the official documentation: docs.python.org/3/library/contextlib.html

MrCrHaM Over a year ago

To read scalar values, use hf['name-of-scalar'][()] , or you will get a ValueError: Illegal slicing argument for scalar dataspace.

Collectives™ on Stack Overflow

Input and output numpy arrays to h5py

2 Answers 2

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related