I have a simple pydantic model with nested data structures.
I want to be able to simply save and load instances of this model as .json file.
All models inherit from a Base class with simple configuration.
class Base(pydantic.BaseModel):
class Config:
extra = 'forbid' # forbid use of extra kwargs
There are some simple data models with inheritance
class Thing(Base):
thing_id: int
class SubThing(Thing):
name: str
And a Container class, which holds a Thing
class Container(Base):
thing: Thing
I can create a Container instance and save it as .json
# make instance of container
c = Container(
thing = SubThing(
thing_id=1,
name='my_thing')
)
json_string = c.json(indent=2)
print(json_string)
"""
{
"thing": {
"thing_id": 1,
"name": "my_thing"
}
}
"""
but the json string does not specify that the thing field was constructed using a SubThing. As such, when I try to load this string into a new Container instance, I get an error.
print(c)
"""
Traceback (most recent call last):
File "...", line 36, in <module>
c = Container.parse_raw(json_string)
File "pydantic/main.py", line 601, in pydantic.main.BaseModel.parse_raw
File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Container
thing -> name
extra fields not permitted (type=value_error.extra)
"""
Is there a simple way to save the Container instance while retaining information about the thing class type such that I can reconstruct the initial Container instance reliably? I would like to avoid pickling the object if possible.
One possible solution is to serialize manually, for example using
def serialize(attr_name, attr_value, dictionary=None):
if dictionary is None:
dictionary = {}
if not isinstance(attr_value, pydantic.BaseModel):
dictionary[attr_name] = attr_value
else:
sub_dictionary = {}
for (sub_name, sub_value) in attr_value:
serialize(sub_name, sub_value, dictionary=sub_dictionary)
dictionary[attr_name] = {type(attr_value).__name__: sub_dictionary}
return dictionary
c1 = Container(
container_name='my_container',
thing=SubThing(
thing_id=1,
name='my_thing')
)
from pprint import pprint as print
print(serialize('Container', c1))
{'Container': {'Container': {'container_name': 'my_container',
'thing': {'SubThing': {'name': 'my_thing',
'thing_id': 1}}}}}
but this gets rid of most of the benefits of leveraging the package for serialization.
pydanticin any case - like do you benefit from the validations it provides? just curiouspydanticdoesn't support loading nested json to a model class, yet there are plans for future support in this use case. I was actually surprised that pydantic doesn't parse a dict to a nested model - seems like a common enough use case to me.defaultdictfields for example. It looks like dataclasses doesn't handle serialization of such field types as expected (I guess it treats it as a normal dict). You can use thedataclasses.asdict()helper function to serialize a dataclass instance, which also works for nested dataclasses. The only problem is de-serializing it back from a dict, which unfortunately seems to be a missing link in dataclasses.