LTP#13: pydantic¶
pydantic
is a popular data validation and serialization library powered by Python type hints.
What does that mean and what can Pydantic do for us?
The problem¶
Many of you might have written a code like this before for a simple data structure:
class Sample:
def __init__(self, date, parameter, size):
self.date = date
self.parameter = parameter
self.size = size
Sample("now", 0.5, 2000)
<__main__.Sample at 0x7f1064035210>
Here are some issues with this code:
- Repetitive, tedious boilerplate code
- What are good values for
date
,parameter
andsize
? - How do we control mutability of the data?
- What if we later want to write that data to a file/send over a network?
- Many things are undefinded, e.g. the
repr
shown here
Partial solutions¶
Some of you might have identified this problem and moved to potential solutions.
namedtuple¶
import collections
Sample = collections.namedtuple("Sample", ("date", "parameter", "size"))
Sample(date="now", parameter=0.5, size=2000)
Sample(date='now', parameter=0.5, size=2000)
- Reduces boilerplate
- Has e.g. a nice
repr
- All other problems are unsolved
dataclasses¶
import dataclasses
from datetime import datetime
@dataclasses.dataclass
class Sample:
date: datetime
parameter: float
size: int
Sample(date="now", parameter=0.5, size=2000)
Sample(date='now', parameter=0.5, size=2000)
- Reduces boilerplate drastically
- Gives you control over mutability
- What about data validation and serialization?
pydantic to the rescue!¶
Here is the same example with a pydantic
base class:
from pydantic import BaseModel
class Sample(BaseModel):
date: datetime
parameter: float
size: int
Sample(date="now", parameter=0.5, size=2000)
--------------------------------------------------------------------------- ValidationError Traceback (most recent call last) Cell In[9], line 1 ----> 1 Sample(date="now", parameter=0.5, size=2000) File /opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data) 212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks 213 __tracebackhide__ = True --> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self) 215 if self is not validated_self: 216 warnings.warn( 217 'A custom validator is returning a value other than `self`.\n' 218 "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n" 219 'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.', 220 stacklevel=2, 221 ) ValidationError: 1 validation error for Sample date Input should be a valid datetime or date, input is too short [type=datetime_from_date_parsing, input_value='now', input_type=str] For further information visit https://errors.pydantic.dev/2.10/v/datetime_from_date_parsing
Sample(date="2024-04-19 12:00", parameter=0.5, size=2000)
Sample(date=datetime.datetime(2024, 4, 19, 12, 0), parameter=0.5, size=2000)
Validation in pydantic¶
Validation logic in pydantic
is type annotation based. It:
- leverages the type annotations
- looks up the logic it has implemented for those types
- automatically converts to the correct type
If the last bit scares you, there is a strict
mode.
Additional validation logic that exceeds type annotations is available:
from pydantic import PositiveInt, confloat
class Sample(BaseModel):
date: datetime
parameter: confloat(gt=0.0, lt=1.0)
size: PositiveInt
Sample(date="2024-04-19 12:00", parameter=0.5, size=2000)
Sample(date=datetime.datetime(2024, 4, 19, 12, 0), parameter=0.5, size=2000)
More validation in pydantic¶
Validation logic can be customized in interesting ways. The following snippet allows to specify the magic string "now"
for the date and it resolves to a timestamp:
from pydantic import field_validator
class Sample(BaseModel):
date: datetime
parameter: confloat(gt=0.0, lt=1.0)
size: PositiveInt
@field_validator("date", mode="before")
def resolve_now(cls, v):
if v == "now":
return datetime.now()
return v
Sample(date="now", parameter=0.5, size=2000)
Sample(date=datetime.datetime(2024, 12, 2, 7, 10, 35, 277321), parameter=0.5, size=2000)
Even more validation in pydantic¶
Sometimes it is better to attach the validation logic to a type though, as it makes it reusable:
from typing_extensions import Annotated
from pydantic.functional_validators import AfterValidator
def check_squares(x: int) -> int:
assert x**0.5 % 1 == 0, f"{x} is not a square number"
return x
SquareNumber = Annotated[int, AfterValidator(check_squares)]
class MyModel(BaseModel):
x: SquareNumber
MyModel(x=4)
MyModel(x=4)
Validating function arguments¶
What if we are not building models, but our interface consists of functions instead?
from pydantic import validate_call
@validate_call(validate_return=True)
def square_root(x: SquareNumber) -> int:
return x**0.5
square_root("4")
2
Note that this contained two implicit conversions:
"4"
->4
2.0
->2
Judge for yourself and your application (!) whether that is a good or a bad thing.
Serialization/Deserialization¶
s = Sample(date="now", parameter=0.5, size=2000)
Assume we have a sample, we can turn it into a dictionary or a JSON string using pydantic
:
s.model_dump()
{'date': datetime.datetime(2024, 12, 2, 7, 10, 35, 320872), 'parameter': 0.5, 'size': 2000}
s.model_dump_json()
'{"date":"2024-12-02T07:10:35.320872","parameter":0.5,"size":2000}'
And we can reconstruct an object from those dumps:
Sample(**s.model_dump())
Sample(date=datetime.datetime(2024, 12, 2, 7, 10, 35, 320872), parameter=0.5, size=2000)
Sample.model_validate_json(s.model_dump_json())
Sample(date=datetime.datetime(2024, 12, 2, 7, 10, 35, 320872), parameter=0.5, size=2000)
Such functionality is of key importance in the design of file formats and transmission protocols.
Summary and References¶
pydantic
gives you a
- simple, yet very powerful toolbox
- allows you to write safer code with less bugs
- saves you from a lot of tedious work
References¶