I am using Pydantic in my project to define data models and am facing a challenge with custom serialization and deserialization. I have a model where I want to internally represent an attribute as a dictionary for easier access by keys, but I need to serialize it as a list when outputting to JSON and deserialize it back from a list into a dictionary when reading JSON.
Moreover, the schema should also specify this field as a list rather than a dict.
Here's a simplified version of what I'm trying to achieve:
from pydantic import BaseModel, Field
from typing import List, Dict, Union
from typing_extensions import Annotated
class MyItem(BaseModel):
name: str = Field(...)
data: str = Field(...)
def validate_items(items: Union[Dict[str, MyItem], List[MyItem]]) -> Dict[str, MyItem]:
if isinstance(items, list):
return {item.name: item for item in items}
elif isinstance(items, dict):
return items
else:
raise ValueError("Input must be a list or a dictionary")
ItemsDict = Annotated[
dict[str, MyItem],
PlainSerializer(
lambda items_dict: list(items_dict.values()),
return_type=list[MyItem],
),
BeforeValidator(validate_items),
]
class MyObject(BaseModel):
items: ItemsDict = Field(...)
# Example instantiation
obj = MyObject(items=[MyItem(name='item1', data='data1'), MyItem(name='item2', data='data2')])
# To generate and print the schema
print(MyObject.schema_json(indent=2))
the output schema still says that items is an object i.e. a dict rather than a list.
I attempted to use Pydantic's Annotated type with custom serialization and validation to convert between the list and dict representations, but I'm unsure how to properly define the serialization/deserialization logic so that:
The internal representation of items is a dictionary for easy access. When serializing MyObject to JSON, items is output as a list of MyItem instances. When deserializing from JSON, a list of MyItem instances is converted back into a dictionary, keyed by MyItem.name. Additionally, when I generate the schema with MyObject.schema_json(indent=2), the items field is still shown as an object (dict) rather than a list, which does not reflect the desired external representation.
Questions:
- How can I customize the serialization/deserialization in Pydantic to achieve this behavior?
- Is there a way to adjust the JSON schema generation in Pydantic to reflect items as a list for external interfaces, while keeping it as a dict internally?
- Any guidance or examples on how to implement this in Pydantic would be greatly appreciated.