From Scala to Python - Python dataclasses
There are two things I missed when I started working with Python after three years of writing Scala code: types and immutability. Fortunately, it turned out that I missed them only because I did not know Python well enough. There is a way of getting some of that functionality in Python without using external libraries!
Spoiler alert. Don’t expect too much. It won’t be like Scala types ;)
Let’s start with types. Using data classes, it is possible to specify the type of a field in a class. The most basic usage looks like this:
1 2 3 4 5 6 from dataclasses import dataclass @dataclass() class User: name: str age: int
Seems to be good, but look what happens when I try to assign a string to the age field.
1 2 3 >>> u = User('Test', 'aaa') >>> u User(name='Test', age='aaa')
It works! It should not work! There is a more powerful way of defining types which does not work either.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 from typing import NewType Name = NewType('Name', str) Age = NewType('Age', int) @dataclass class User: name: Name age: Age >>> User('Name', 123) User(name='Name', age=123) >>> User(Name('Test'), Name(1555)) User(name='Test', age=1555)
I am so disappointed.
At least, we can specify the expected type of a function parameter and the type of the returned value! Can we?
It makes no sense, because still, nothing stops me from misusing it…
1 2 3 4 5 def name_to_age(name: Name) -> Age: return Age(123) >>> name_to_age(Age(50)) 123
Parsing machine learning logs with Ahana, a managed Presto service, and Cube, a headless BI solution
Check out my article published on the Cube.dev blog!
Fortunately, there is one thing which works as expected. Python elegantly solves the problem of immutability. All we need to do is adding a parameter to an annotation.
1 2 3 4 5 6 7 8 9 10 11 @dataclass(frozen = True) class User: name: str age: int >>> u = User('Test', 123) >>> u.name = 'Another user' Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 3, in __setattr__ dataclasses.FrozenInstanceError: cannot assign to field 'name'
At least something works. What about types? It seems that I can use them only as a part of the documentation. For data validation, https://pydantic-docs.helpmanual.io library must be used.
You may also like
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz