13

I want to dynamically generate a Pydantic model at runtime. I can do this by calling create_model. For example,

from pydantic import create_model

create_model("MyModel", i=(int,...), s=(str...))

does this same thing as

from pydantic import BaseModel

class MyModel(BaseModel):
    i: int
    s: str

I want to serialize these Pydantic schemas as JSON. It's easy to write code to parse JSON into create_model arguments, and it would make sense to use the output of BaseModel.schema_json() since that already defines a serialization format. That makes me think that there should already be some sort of BaseModel.from_json_schema classmethod that could dynamically create a model like so

from pydantic import BaseModel

class MyModel(BaseModel):
    i: int
    s: str

my_model = BaseModel.from_json_schema(MyModel.schema_json())
my_model(i=5, s="s") # returns MyModel(i=5, s="s")

I can't find any such function in the documentation. Am I overlooking something, or do I have to write my own own JSON schema deserialization code?

0

5 Answers 5

10

This has been discussed some time ago and Samuel Colvin said he didn't want to pursue this as a feature for Pydantic.

If you are fine with code generation instead of actual runtime creation of models, you can use the datamodel-code-generator.

To be honest, I struggle to see the use case for generating complex models at runtime, seeing as their main purpose is validation, implying that you think about correct schema before running your program. But that is just my view.

For simple models I guess you can throw together your own logic for this fairly quickly.

If do you need something more sophisticated, the aforementioned library does offer some extensibility. You should be able to import and inherit from some of their classes like the JsonSchemaParser. Maybe that will get you somewhere.

Ultimately I think this becomes non-trivial very quickly, which is why Pydantic's maintainer didn't want to deal with it and why there is a whole separate project for this.

Sign up to request clarification or add additional context in comments.

5 Comments

I'm trying to create an ETL application that does data ingestion and transformation in a configurable manner. Users write their own configurations. An aspect of the data transformation is type validation and coercion for which Pydantic seems a good choice. I could just have users write this part as Pydantic classees in Python source code but I don't want to because (1) writing configuration as source code gets tricky fast and (2) there are going to be other configurable aspects that require JSON.
The github discussion is really helpful. I'll probably just write my own schema deserialization, since if this grows into a full-blown ETL application I'll end up using someone else's implementation of that anyway.
I hvae the use case: the Architect writes the schema in a repository, the Dev will generate on the fly the classes, validation and so on...
Another use case - in a microservices architecture where endpoints and potentially even the dev teams that write them communicate mostly via HTTP, if each endpoint discloses its IO schema, then the connecting parts can validate the inputs and outputs without having import access to their Pydantic model source code, while still even getting IDE completions
One VERY GOOD use case for this is for multi-language scenarios, for example where you might be using python in one part and javascript elsewhere.
6

Updated @Alon's answer to handle nested modals:

from typing import Any, Type, Optional
from enum import Enum

from pydantic import BaseModel, Field, create_model


def json_schema_to_base_model(schema: dict[str, Any]) -> Type[BaseModel]:
    type_mapping: dict[str, type] = {
        "string": str,
        "integer": int,
        "number": float,
        "boolean": bool,
        "array": list,
        "object": dict,
    }

    properties = schema.get("properties", {})
    required_fields = schema.get("required", [])
    model_fields = {}

    def process_field(field_name: str, field_props: dict[str, Any]) -> tuple:
        """Recursively processes a field and returns its type and Field instance."""
        json_type = field_props.get("type", "string")
        enum_values = field_props.get("enum")

        # Handle Enums
        if enum_values:
            enum_name: str = f"{field_name.capitalize()}Enum"
            field_type = Enum(enum_name, {v: v for v in enum_values})
        # Handle Nested Objects
        elif json_type == "object" and "properties" in field_props:
            field_type = json_schema_to_base_model(
                field_props
            )  # Recursively create submodel
        # Handle Arrays with Nested Objects
        elif json_type == "array" and "items" in field_props:
            item_props = field_props["items"]
            if item_props.get("type") == "object":
                item_type: type[BaseModel] = json_schema_to_base_model(item_props)
            else:
                item_type: type = type_mapping.get(item_props.get("type"), Any)
            field_type = list[item_type]
        else:
            field_type = type_mapping.get(json_type, Any)

        # Handle default values and optionality
        default_value = field_props.get("default", ...)
        nullable = field_props.get("nullable", False)
        description = field_props.get("title", "")

        if nullable:
            field_type = Optional[field_type]

        if field_name not in required_fields:
            default_value = field_props.get("default", None)

        return field_type, Field(default_value, description=description)

    # Process each field
    for field_name, field_props in properties.items():
        model_fields[field_name] = process_field(field_name, field_props)

    return create_model(schema.get("title", "DynamicModel"), **model_fields)

Example Schema

schema = {
    "title": "User",
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "is_active": {"type": "boolean"},
        "address": {
            "type": "object",
            "properties": {
                "street": {"type": "string"},
                "city": {"type": "string"},
                "zipcode": {"type": "integer"},
            },
        },
        "roles": {
            "type": "array",
            "items": {
                "type": "string",
                "enum": ["admin", "user", "guest"]
            }
        }
    },
    "required": ["name", "age"]
}

Generate the Pydantic model

DynamicModel = json_schema_to_base_model(schema)

Example usage

print(DynamicModel.schema_json(indent=2))

Comments

2
from typing import Any, Type, Optional
from pydantic import BaseModel, Field, create_model
from enum import Enum

def json_schema_to_base_model(schema: dict[str, Any]) -> Type[BaseModel]:
    type_mapping = {
        "string": str,
        "integer": int,
        "number": float,
        "boolean": bool,
        "array": list,
        "object": dict,
    }

    properties = schema.get("properties", {})
    required_fields = schema.get("required", [])
    model_fields = {}

    for field_name, field_props in properties.items():
        json_type = field_props.get("type", "string")
        enum_values = field_props.get("enum")

        if enum_values:
            enum_name = f"{field_name.capitalize()}Enum"
            field_type = Enum(enum_name, {v: v for v in enum_values})
        else:
            field_type = type_mapping.get(json_type, Any)

        default_value = field_props.get("default", ...)
        nullable = field_props.get("nullable", False)
        description = field_props.get("title", "")

        if nullable:
            field_type = Optional[field_type]

        if field_name not in required_fields:
            default_value = field_props.get("default", None)

        model_fields[field_name] = (field_type, Field(default_value, description=description))

    return create_model(schema.get("title", "DynamicModel"), **model_fields)

Example schema

schema = {
    "properties": {
        "short_description": {
            "title": "Short Description",
            "type": "string"
        },
        "long_description": {
            "title": "Long Description",
            "type": "string"
        },
        "recommendation": {
            "enum": ["SLOW_MOVERS", "BEST_SELLERS"],
            "title": "Recommendation",
            "type": "string"
        }
    },
    "required": ["short_description", "long_description", "recommendation"],
    "title": "Test",
    "type": "object"
}

Generate the Pydantic model

DynamicModel = json_schema_to_base_model(schema)

Example usage

print(DynamicModel.schema_json(indent=2))

Comments

0

You may consider using datamodel-code-generator. I have come to know about this library from Reddit. Apart from JSON Schema input and Pydentic output, this library also supports other input-output types.

from datamodel_code_generator import InputFileType, DataModelType, generate
from pathlib import Path
import json, os

task_schema = {
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Task",
    "description": "Schema defining a task in a task management system.",
    "type": "object",
    "properties": {
        "id": {
            "type": "integer",
            "description": "Unique identifier for the task."
        },
        "title": {
            "type": "string",
            "description": "Short title describing the task."
        },
        "priority": {
            "type": "string",
            "description": "Priority level of the task.",
            "enum": ["low", "medium", "high"]
        },
        "status": {
            "type": "string",
            "description": "Current status of the task.",
            "enum": ["todo", "in_progress", "done"]
        }
    },
    "required": ["id", "title", "priority", "status"],
    "additionalProperties": False
}

# Mimicking a JSON file reading from a dict, you can read your JSON schema with json.load(...) if you already have that schema in a JSON file.
schema_json = json.dumps(task_schema, indent=2)

output_path = Path.cwd() / "pydantic_demo.py" # The library expects a pathlib.Path object to get the output file path.

generate(
    input_=schema_json,
    input_file_type=InputFileType.JsonSchema,
    output=output_path,
    output_model_type=DataModelType.PydanticV2BaseModel, # Or, PydanticBaseModel
    use_field_description=False, # Keeping this False will keep the description as an attribue in Field(). But if it was True, the description would become a docstring. 
    use_schema_description=True,
)

You will find the code in the output_path directory.

Comments

-1

Like was mentioned before, it's possible to use datamodel-code-generator to create the classes dynamically, but that only generates the code, and doesn't create the class in memory. It's possible that you could modify the library to return the code string, then execute that code and return the result.

Or you could just use DynamicPydantic, a project I made that does all that for you, and accepts both json schema and sqlalchemy models. here's an example of json schema generation with the library:

    from dynamicpydantic import jsonschema_pydantic
    json_schema={
    'title': 'employees_auto', 
    'type': 'object', 
    'properties': {
        'emp_no': {'type': 'integer'}, 
        'birth_date': {'type': 'string', 'format': 'date'},
        'first_name': {'type': 'string', 'maxLength': 14},
        'last_name': {'type': 'string', 'maxLength': 16},
        'gender': {'type': 'string', 'maxLength': 1, 'enum': ['M', 'F']}, 
        'hire_date': {'type': 'string', 'format': 'date'
        }
    }, 'required': [
        'birth_date', 
        'emp_no', 
        'first_name', 
        'gender', 
        'hire_date', 
        'last_name'
    ]
    }
    PydModel = jsonschema_pydantic(json_schema)
    print(PydModel.__fields__)

"{'emp_no': ModelField(name='emp_no', type=int, required=True), 
'birth_date': ModelField(name='birth_date', type=date, required=True), 'first_name': ModelField(name='first_name', type=ConstrainedStrValue, required=True), 'last_name': ModelField(name='last_name', type=ConstrainedStrValue, required=True), 'gender': ModelField(name='gender', type=Gender, required=True), 'hire_date': ModelField(name='hire_date', type=date, required=True)}"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.