From Scala to Python - Python dataclasses

There are two things I missed when I started working with Python after three years of writing Scala code: types and immutability. Fortunately, it turned out that I missed them only because I did not know Python well enough. There is a way of getting some of that functionality in Python without using external libraries!

Spoiler alert. Don’t expect too much. It won’t be like Scala types ;)

Data classes

Let’s start with types. Using data classes, it is possible to specify the type of a field in a class. The most basic usage looks like this:

from dataclasses import dataclass

@dataclass()
class User:
  name: str
  age: int

Seems to be good, but look what happens when I try to assign a string to the age field.

>>> u = User('Test', 'aaa')
>>> u
User(name='Test', age='aaa')

It works! It should not work! There is a more powerful way of defining types which does not work either.

from typing import NewType
Name = NewType('Name', str)
Age = NewType('Age', int)

@dataclass
class User:
  name: Name
  age: Age

>>> User('Name', 123)
User(name='Name', age=123)

>>> User(Name('Test'), Name(1555))
User(name='Test', age=1555)

I am so disappointed.

Functions

At least, we can specify the expected type of a function parameter and the type of the returned value! Can we?

It makes no sense, because still, nothing stops me from misusing it…

def name_to_age(name: Name) -> Age:
  return Age(123)

>>> name_to_age(Age(50))
123

Immutability

Fortunately, there is one thing which works as expected. Python elegantly solves the problem of immutability. All we need to do is adding a parameter to an annotation.

@dataclass(frozen = True)
class User:
  name: str
  age: int

>>> u = User('Test', 123)
>>> u.name = 'Another user'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'name'

At least something works. What about types? It seems that I can use them only as a part of the documentation. For data validation, https://pydantic-docs.helpmanual.io library must be used.

Older post

Notetaking for data science

How to document a project?

Newer post

Re: DataOps Principles: How Startups Do Data The Right Way

Team vs. a bunch of individuals reporting work time in the same spreadsheet