Why do data engineers quit?
Data engineers don’t differ much from other developers. We quit for the same reasons. However, some causes seem more common in data engineering teams. Mainly because we are the “backend of a backend” engineers.
So why do we quit, and what to do to prevent data engineers from leaving?
Let’s begin with the technical reasons because not many of them exist. In general, developers don’t quit because of the technology they use unless it degrades their quality of life.
Often, we can’t point to a part of the system and tell “this is what I made” and expect others to understand your contribution. Also, when things break in data engineering, the failures are all over the place due to dependencies between data sources. Imagine fighting a chain reaction of failure every day.
Does your data platform fail three times a week, and fixes require a few hours of work? Imagine working in such an environment. You can’t plan anything unless all your plans include a small print: “we do that unless everything breaks and we have to fight bugs all day.”
Sometimes the team has a designated person who focuses on solving the problems while everyone else can more or less continue working normally. Of course, the bug fixer duty is not a permanent position. You do it for one day or one week, and then another person takes the responsibility. Does that work? Sort of. It lets the team continue working, but people will soon feel annoyed by the fixer duty if the team doesn’t address the bugs’ cause.
Addressing the root cause is difficult because usually, the developers working on the project have caused the issues. Did they write tests? Do they run those tests before every deployment? Do they care about fixing the problem or accept the constant failures? Do they even care?
You may think the bug fixer will be incentivized to solve the issues. That is not often the case. Sometimes the bug fixer waits until their duty ends, passing the buck to another person. For them, the problem doesn’t exist if someone else suffers from it.
It was cutting edge… in 2008
Working with old tech isn’t always a problem. However, if you work only on things not used by anyone else, you degrade your market value. Who cares whether you have five years of experience in using Hadoop? Fine, some companies still care. But they usually run Hadoop in some legacy setup while every new feature uses Spark, Presto, Kafka, etc. What if you got stuck in a place where Hadoop is the only thing in use? Would you ever find a new job?
Of course, the developers learn tons of stuff even when they work on legacy projects. They learn debugging, retrofitting tests, refactoring. They learn communication when they have to figure out why something was built or when they have to persuade people that a feature isn’t possible in the current setup.
We pay lip service to the importance of soft skills. However, during job interviews, programmers will hear questions about Spark, AWS Glue, or whatever the prospective company uses.
Working on current technology at work is crucial because it isn’t easy to learn good data engineering practices at home. You may practice using Spark. You may even setup a cluster instead of running it in local mode. But you won’t see many common problems while learning it in such a way. I don’t know about you, but I don’t have 100 TB of tabular data at home to play with Spark. I don’t even have 1 TB of such data. Do you?
The technical reasons of quitting are important, but you can address the problems and solve them quite quickly if you focus all your efforts on doing it. The engineers may spend months fixing the pipelines or implementing every new feature using newer technology. Nevertheless, improving the software is all we do. We will get it working, eventually.
The organizational reasons to quit are way trickier, and, most likely, you can’t do anything about them.
Who wants to create something useless? Not many people. Who wants to make something useful but never used by other people? Even fewer people want this. Unfortunately, it’s common in data engineering.
It’s usually a reporting pipeline, isn’t it? How often have you worked on a reporting pipeline someone wanted but never used? People want reports. They want them because we must be data-driven. But later, the report gets ignored, decisions are made based on gut feeling, and nobody looks at the reports because the real data doesn’t back the already made decisions. Too harsh? Perhaps. Also, too true, unfortunately.
Same s…software, different day
Does the data engineering team do the same thing over and over again? Not really THE same thing but the same overall category of things: sales reports, clickstream reports, data ingestion, etc. Same thing every day but with different data sources. It gets boring pretty quickly.
At some point, it makes sense to invest time in tooling. Even if the team continues working on the same stuff, their work will get easier if they prepare development tools. If they do a good job, they may even create a self-service tool for everyone who needs a “new” report that is “just like the other one but with dataset X instead of Y.”
As said before, we are “the backend of a backend.” All the glory goes to the team implementing the visual representation of the feature unless someone takes care of mentioning the data team’s contribution.
The UI would display a white rectangle if you didn’t build a data pipeline, a data warehouse, or train a machine learning model. But somebody has to say that during the meeting!
Parsing machine learning logs with Ahana, a managed Presto service, and Cube, a headless BI solution
Check out my article published on the Cube.dev blog!
Last salary review three years ago
It seems forbidden to say it aloud, so I will do it anyway: People work for money. The challenging projects and fun work environment won’t matter if you are perpetually underpaid.
How much would you have to spend to hire a replacement when a data engineer quits? Do you pay them at least 95% of the amount paid to the new joiners? Nobody will leave the job to get a 5% raise (at least, that won’t be the only reason). Many people won’t quit getting even a 20% raise if they are happy with everything else. But if the difference between my salary and my market value gets any bigger, I will think whether there is a place where I will be just as satisfied with the work and paid fairly.
Unbearable working hours
In companies spread across multiple timezones, someone will have to come to work early or stay longer when teams from various locations have a meeting, especially if there is no overlap between the working times.
The problem begins when the teams from one location are always staying longer or coming earlier. Perhaps it wouldn’t be as bad if companies openly said that one office is a place with subcontractors who have to adjust to everyone else’s schedules. However, to add an insult to the injury, often we hear, “we are one company. We treat everyone the same.” So why do some people always need to adjust their personal lives to accommodate a meeting schedule convenient to others?
No remote work
Seriously? It is 2021, almost 2022. I don’t have to be in an office to do my work.
If I have to commute to the office, the commute time should be included in my work time or paid extra. I’m dead serious about it. I don’t need the office. If someone insists on it anyway, pay me extra for working on-site.
Things you can’t do anything about
Sometimes people want to try something new. A backend developer intends to do frontend for a while. A data analyst wants to switch to a PM role. A product manager wants to do data science. Those people will quit if they can’t do it at their current company. It is ok. They want to learn something new. Help them or get out of their way.
You may also like
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz