1

I have several million records I want to store, retrieve, delete pretty frequently. Each of these records has a "key", but the "value" is not easily translatable to a dictionary as it is an arbitrary Python object returned from a module method that I didn't write (I understand that a lot of hierarchical data structures like json work better as dictionaries, and not sure if json is the preferred database in any case).

I am thinking to pickle each entry in a separate file. Is there a better way?

2 Answers 2

3

Use the shelve module.

You can use it as a dictionary, much like in json, but it stores objects using pickle.

From the python official docs:

import shelve

d = shelve.open(filename) # open -- file may get suffix added by low-level
                          # library

d[key] = data   # store data at key (overwrites old data if
                # using an existing key)
data = d[key]   # retrieve a COPY of data at key (raise KeyError if no
                # such key)
del d[key]      # delete data stored at key (raises KeyError
                # if no such key)
flag = d.has_key(key)   # true if the key exists
klist = d.keys() # a list of all existing keys (slow!)

# as d was opened WITHOUT writeback=True, beware:
d['xx'] = range(4)  # this works as expected, but...
d['xx'].append(5)   # *this doesn't!* -- d['xx'] is STILL range(4)!

# having opened d without writeback=True, you need to code carefully:
temp = d['xx']      # extracts the copy
temp.append(5)      # mutates the copy
d['xx'] = temp      # stores the copy right back, to persist it

# or, d=shelve.open(filename,writeback=True) would let you just code
# d['xx'].append(5) and have it work as expected, BUT it would also
# consume more memory and make the d.close() operation slower.

d.close()       # close it
Sign up to request clarification or add additional context in comments.

1 Comment

So it would pickle everything to a single file?
1

I would evaluate the use of a key/value database like berkeleydb, kyoto cabinet or others. This will give you all the fancy things plus a better handling of disk space. In a filesystem with a block size of 4096B, one million files occupy ~4GB whatever is the size of your objects (as lower bound limit, if the objects are larger than 4096B the the size increase).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.