• SSC Lunch Time Python
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

LTP#13: pydantic¶

pydantic is a popular data validation and serialization library powered by Python type hints.

What does that mean and what can Pydantic do for us?

The problem¶

Many of you might have written a code like this before for a simple data structure:

In [1]:
class Sample:
    def __init__(self, date, parameter, size):
        self.date = date
        self.parameter = parameter
        self.size = size
In [2]:
Sample("now", 0.5, 2000)
Out[2]:
<__main__.Sample at 0x7ff2ed01f850>

Here are some issues with this code:

  • Repetitive, tedious boilerplate code
  • What are good values for date, parameter and size?
  • How do we control mutability of the data?
  • What if we later want to write that data to a file/send over a network?
  • Many things are undefinded, e.g. the repr shown here

Partial solutions¶

Some of you might have identified this problem and moved to potential solutions.

namedtuple¶

In [3]:
import collections

Sample = collections.namedtuple("Sample", ("date", "parameter", "size"))
In [4]:
Sample(date="now", parameter=0.5, size=2000)
Out[4]:
Sample(date='now', parameter=0.5, size=2000)
  • Reduces boilerplate
  • Has e.g. a nice repr
  • All other problems are unsolved

dataclasses¶

In [5]:
import dataclasses
from datetime import datetime


@dataclasses.dataclass
class Sample:
    date: datetime
    parameter: float
    size: int
In [6]:
Sample(date="now", parameter=0.5, size=2000)
Out[6]:
Sample(date='now', parameter=0.5, size=2000)
  • Reduces boilerplate drastically
  • Gives you control over mutability
  • What about data validation and serialization?

pydantic to the rescue!¶

Here is the same example with a pydantic base class:

In [7]:
from pydantic import BaseModel
In [8]:
class Sample(BaseModel):
    date: datetime
    parameter: float
    size: int
In [9]:
Sample(date="now", parameter=0.5, size=2000)
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In[9], line 1
----> 1 Sample(date="now", parameter=0.5, size=2000)

File /opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/pydantic/main.py:214, in BaseModel.__init__(self, **data)
    212 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
    213 __tracebackhide__ = True
--> 214 validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    215 if self is not validated_self:
    216     warnings.warn(
    217         'A custom validator is returning a value other than `self`.\n'
    218         "Returning anything other than `self` from a top level model validator isn't supported when validating via `__init__`.\n"
    219         'See the `model_validator` docs (https://docs.pydantic.dev/latest/concepts/validators/#model-validators) for more details.',
    220         stacklevel=2,
    221     )

ValidationError: 1 validation error for Sample
date
  Input should be a valid datetime or date, input is too short [type=datetime_from_date_parsing, input_value='now', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/datetime_from_date_parsing
In [10]:
Sample(date="2024-04-19 12:00", parameter=0.5, size=2000)
Out[10]:
Sample(date=datetime.datetime(2024, 4, 19, 12, 0), parameter=0.5, size=2000)

Validation in pydantic¶

Validation logic in pydantic is type annotation based. It:

  • leverages the type annotations
  • looks up the logic it has implemented for those types
  • automatically converts to the correct type

If the last bit scares you, there is a strict mode.

Additional validation logic that exceeds type annotations is available:

In [11]:
from pydantic import PositiveInt, confloat
In [12]:
class Sample(BaseModel):
    date: datetime
    parameter: confloat(gt=0.0, lt=1.0)
    size: PositiveInt
In [13]:
Sample(date="2024-04-19 12:00", parameter=0.5, size=2000)
Out[13]:
Sample(date=datetime.datetime(2024, 4, 19, 12, 0), parameter=0.5, size=2000)

More validation in pydantic¶

Validation logic can be customized in interesting ways. The following snippet allows to specify the magic string "now" for the date and it resolves to a timestamp:

In [14]:
from pydantic import field_validator


class Sample(BaseModel):
    date: datetime
    parameter: confloat(gt=0.0, lt=1.0)
    size: PositiveInt

    @field_validator("date", mode="before")
    def resolve_now(cls, v):
        if v == "now":
            return datetime.now()
        return v
In [15]:
Sample(date="now", parameter=0.5, size=2000)
Out[15]:
Sample(date=datetime.datetime(2025, 1, 2, 8, 31, 13, 567666), parameter=0.5, size=2000)

Even more validation in pydantic¶

Sometimes it is better to attach the validation logic to a type though, as it makes it reusable:

In [16]:
from typing_extensions import Annotated
from pydantic.functional_validators import AfterValidator
In [17]:
def check_squares(x: int) -> int:
    assert x**0.5 % 1 == 0, f"{x} is not a square number"
    return x
In [18]:
SquareNumber = Annotated[int, AfterValidator(check_squares)]
In [19]:
class MyModel(BaseModel):
    x: SquareNumber
In [20]:
MyModel(x=4)
Out[20]:
MyModel(x=4)

Validating function arguments¶

What if we are not building models, but our interface consists of functions instead?

In [21]:
from pydantic import validate_call
In [22]:
@validate_call(validate_return=True)
def square_root(x: SquareNumber) -> int:
    return x**0.5
In [23]:
square_root("4")
Out[23]:
2

Note that this contained two implicit conversions:

  • "4" -> 4
  • 2.0 -> 2 Judge for yourself and your application (!) whether that is a good or a bad thing.

Serialization/Deserialization¶

In [24]:
s = Sample(date="now", parameter=0.5, size=2000)

Assume we have a sample, we can turn it into a dictionary or a JSON string using pydantic:

In [25]:
s.model_dump()
Out[25]:
{'date': datetime.datetime(2025, 1, 2, 8, 31, 13, 610429),
 'parameter': 0.5,
 'size': 2000}
In [26]:
s.model_dump_json()
Out[26]:
'{"date":"2025-01-02T08:31:13.610429","parameter":0.5,"size":2000}'

And we can reconstruct an object from those dumps:

In [27]:
Sample(**s.model_dump())
Out[27]:
Sample(date=datetime.datetime(2025, 1, 2, 8, 31, 13, 610429), parameter=0.5, size=2000)
In [28]:
Sample.model_validate_json(s.model_dump_json())
Out[28]:
Sample(date=datetime.datetime(2025, 1, 2, 8, 31, 13, 610429), parameter=0.5, size=2000)

Such functionality is of key importance in the design of file formats and transmission protocols.

Summary and References¶

pydantic gives you a

  • simple, yet very powerful toolbox
  • allows you to write safer code with less bugs
  • saves you from a lot of tedious work

References¶

  • https://docs.pydantic.dev/latest/
  • https://github.com/pydantic/pydantic
In [ ]: