0

I am wondering if, and if so which guarantees there are about the pickle module when using pickle.dump[s].

Specifically for my problem I am pickling list[T] where T can be bool, int, decimal, time, date, datetime, timedelta or str, so the list is homogeneous (one type per list). I wonder if it is guaranteed that lists that would be equal in python are also guaranteed to have the same pickled result, at least for the types given.

I couldn't find any guarantees online, but from some basic testing I couldn't find a case where the data would be different.

So TL;DR:

Given two list[T] where T is bool | int | decimal | time | date | datetime | timedelta | str that are equal in python, will the resulting pickle.dump() objects also be equal?

3
  • 1
    Why do you care about it? Pickle is just a serialization method. Commented May 10 at 20:00
  • It may practically be the case that this is guaranteed but pickle makes no such guarantees. Commented May 12 at 7:25
  • @balderman If there some guarantees, the pickled data could be stored in my database with a unique constraint. I already redesigned my table anyway. Commented May 12 at 10:03

1 Answer 1

1

There are cases where values of different types are equal, such as 1 = 1.0 and 0 == False. Lists containing these values will be equal, but the pickle dumps are different so that it can restore them with the correct types.

>>> l1 = [0] # T is int
>>> l2 = [False] # T is bool
>>> l1 == l2
True
>>> pickle.dumps(l1) == pickle.dumps(l2)
False
Sign up to request clarification or add additional context in comments.

8 Comments

The inverse is also true. If you have [Decimal('NaN')], the list is not equal to itself, but pickle will produce the same output consistently.
And whilst we're at, with Decimal, you can get two lists of the same type that are equal, but produce different pickles. eg. [Decimal('0')] vs [Decimal('-0')] vs [Decimal('0.0')]
@Dunes That list (with NaN) is equal to itself.
@KellyBundy I think they meant [Decimal('NaN')] != [Decimal('NaN')]. The issue is that list equality has a shortcut -- if the memory address is the same, it doesn't recurse into the elements and returns True.
@Barmar It does compare the elements. Try a = [0] * 10**5 and then for _ in a: a == a. Takes me 16 seconds. Wouldn't take that long with that shortcut.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.