What a data engineer can learn from The Unicorn Project?

Have you ever seen a novel about developers? Reading such a book seems to be a massive waste of time, doesn’t it? After all, the internet is full of stories written by real developers. What would happen if someone put all of those horror stories about terrible projects and incompetent managers into one book?

That is precisely the content of The Unicorn Project by Gene Kim. In that book, we follow Maxine, an engineer who experiences the biggest software engineering problems, all at once.

The Unicorn Project is like “The Martian” by Andy Weir, we know that everything is going to end up well, but the author keeps us asking the question: “What else?”

What else can go wrong? It seems that everything that could happen already happened. We know that there is something else coming, something that will crash the protagonists to the ground. It will look like an unbeatable challenge, but somehow they will overcome it.

It is an enjoyable story to read, but it is also something more substantial. This story is an excellent description of an engineer who makes other engineers more productive. We may call such a person a distinguished engineer. I think that we may also call them data engineers. After all, the sole purpose of a data engineering team is making data usage easier for other people.

In this book, the idea of increasing other engineer’s productivity is defined as The Five Ideals. Those ideals describe the working environment in which programmers are the most productive. Maxine learns about those ideals in a somewhat unusual conversation at a bar.

I will not spoil all of them to you, but one of them is “Focus, Flow, and Joy.” The character portrayed by Gene Kim uses those words to describe it:

“It’s all about how our daily work feels. Is our work marked by boredom and waiting for other people to get things done on our behalf? Do we blindly work on small pieces of the whole, only seeing the outcomes of our work during a deployment when everything blows up? (…) Or do we work in small batches, ideally single-piece flow, getting fast and continual feedback on our work?”

Is cooperation with data teams blissful? Does it bring joy? Are we helpful? How often does a data engineering team become the bottleneck?

Older post

AI in production: Roobits Events360

What would you do if you were writing an application which had to process one billion events per day?

Newer post

Apache Spark: should we use RDD, Dataset, or DataFrame?

Is there a difference between Dataset and DataFrame? Why do we even have both?