9

I received this .h5 file from a friend and I need to use the data in it for some work. All the data is numerical. This the first time I work with these kind of files. I found many questions and answers here about reading these files but I couldn't find a way to get to lower level of the groups or folders the file contains. The file contains two main folders, i.e. X and Y X contains a folder named 0 which contains two folders named A and B. Y contains ten folders named 1-10. The data I want to read is in A,B,1,2,..,10 for instance I start with

f = h5py.File(filename, 'r')
f.keys()

Now f returns [u'X', u'Y'] The two main folders

Then I try to read X and Y using read_direct but I get the error

AttributeError: 'Group' object has no attribute 'read_direct'

I try to create an object for X and Y as follows

obj1 = f['X']

obj2 = f['Y']

Then if I use command like

obj1.shape
obj1.dtype 

I get an error

AttributeError: 'Group' object has no attribute 'shape'

I can see that these command don't work because I use then on X and Y which are folders contains no data but other folders.

So my question is how to get down to the folders named A, B,1-10 to read the data

I couldn't find a way to do that even in the documentation http://docs.h5py.org/en/latest/quick.html

1
  • Groups are like Python dictionaries. You have to keep indexing down through the groups until you reach a dataset. That has a .shape, and ability to download as a numpy array. x = f["x']['foo']['bar'][...] Commented Jul 26, 2018 at 23:12

1 Answer 1

15

You need to traverse down your HDF5 hierarchy until you reach a dataset. Groups do not have a shape or type, datasets do.

Assuming you do not know your hierarchy structure in advance, you can use a recursive algorithm to yield, via an iterator, full paths to all available datasets in the form group1/group2/.../dataset. Below is an example.

import h5py

def traverse_datasets(hdf_file):

    def h5py_dataset_iterator(g, prefix=''):
        for key in g.keys():
            item = g[key]
            path = f'{prefix}/{key}'
            if isinstance(item, h5py.Dataset): # test for dataset
                yield (path, item)
            elif isinstance(item, h5py.Group): # test for group (go down)
                yield from h5py_dataset_iterator(item, path)

    for path, _ in h5py_dataset_iterator(hdf_file):
        yield path

You can, for example, iterate all dataset paths and output attributes which interest you:

with h5py.File(filename, 'r') as f:
    for dset in traverse_datasets(f):
        print('Path:', dset)
        print('Shape:', f[dset].shape)
        print('Data type:', f[dset].dtype)

Remember that, by default, arrays in HDF5 are not read entirely in memory. You can read into memory via arr = f[dset][:], where dset is the full path.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you so much. I tried the comment before your answer and that worked. obj1=f["X']['A']. I was wondering how to read the decond folder that has 10 sub folders. Because that's gonna change in the future to different numbers. SI I find your answer very helpful. I still get an invalid syntax error at the line: path = f'{prefix}/{key}'. Now if I want to save the data in the sub folders 1-10 using a loop instead of using 10 different commands especially as I said the number could change in the future
You get a syntax error because f-strings only work in Python 3.6+. You can use path = '{0}/{1}'.format(prefix, key) instead.
I got an error at from in yield from h5py_dataset_iterator(item, path) when I removed it the code worked with no error but didn't print the date types or any of those attributes in print command. What is more important is how to read the ten datasets in the group Y? more for a loop to read them because the number of datasets in that group can be different and not always 10. Sorry I feel like bugging you with this question
What version of Python are you using? yield from was introduced in v3.3.
mine is 2.7.12. seems too old comparing to 3.3

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.