Theory of constraints in data engineering
I must admit I almost gave up on writing this article. The more I read about the theory of constraints, the more it looked like a concept invented by consultants. Yet, at some point, I realized the ugly truth. Ignoring the theory of constraints was the root cause of many problems in my software projects.
What is the theory of constraints?
The theory is shockingly simple: In every system, there is only one factor limiting the entire system’s throughput.
A simple statement, yet we can derive quite complex consequences from it:
- only one top constraint exists
It may be obvious to everyone who understands the meaning of the word “top.” But, on the other hand, some people have twenty top priorities at once, so maybe the definition of “top” not self-evident.
- improvements to other constraints don’t matter until we deal with the top constraint
If one factor limits the entire system’s output, ignoring the top factor won’t increase the overall performance. We can feel productive and look busy working very hard on fixing unimportant problems. Yet, we shouldn’t try improving everything else before we address the root issue.
- we need metrics
We need them not only to find the most significant bottleneck. We’ll use them to verify whether our work made any difference.
What do we do with the top constraint?
How do we deal with bottlenecks? The proponents of the theory of constraints suggest a five-step process:
- Identify - first must find the top constraint. If you think that many factors are top constraints, you are wrong. It means you failed to identify the real constraint.
- Exploit - do everything you can in the current situation to improve the entire system without dealing with the root cause. The goal here is to check whether a simple tweak is good enough. Perhaps, after a minor alteration, the process becomes much faster. But does it mean we solved the problem? No, but maybe something else becomes the new top bottleneck.
- Subordinate - exploit other parts of the system to reduce the bottleneck. Look for workarounds and easy wins.
- Elevate - deal with the constraint directly. If every other attempt failed, removing the bottleneck is the only solution.
- Repeat - there is only one top constraint, but there is always at least one constraint. After you fix the most significant issue, you can move to the next problem. Find the new top constraint, and repeat the process.
The human constraint
The theory of constraints helps you identify and solve the problem. However, it does not tell us what to do when a person is the top constraint in an organization. It happens for various reasons: gatekeeping, being a toxic person, laziness, lack of required skills combined with an unwillingness to obtain them, etc.
A “human constraint” complicates the process. People are the only constraints that retaliate. If you identify them as the bottleneck, they’ll fight back. No surprises here. If someone told you were the largest problem, you would fight too.
The trick here is to blame the process. Of course, we’ll not address the root cause - the person. Although, we can build a process around the person. The process safeguards everyone else or forces the person to change their behavior. It is not a nice solution either. The person will, most likely, feel attacked anyway. There are no “nice” solutions to problems caused by someone’s personality.
Theory of constraints in data engineering
What are the most significant constraints in data engineering projects? From my observation, it’s usually the thing causing a long cycle time. Does the team look busy but nothing ever gets done? Do they seem overworked but underachieving?
I have seen two reasons for such situations.
Tons of bugs and maintenance work
The team spends most of the time fixing the problems. Or rather “fixing” the issues because they are correcting the output data. The team doesn’t have time to address the underlying problem: too many bugs, no time to fix them properly.
The solution here is to introduce proper testing gradually. In data engineering projects, we must test both the code and the input data. I also suggest testing the output data to make sure we don’t propagate a bug downstream. If your project is falling apart, it is even better to start with the output data verification. It gives you instant benefits.
The team is working very hard, but nothing ever gets delivered
Usually, the root cause is ill-understood perfectionism. Quite often, it happens to data scientists. They want to build the perfect model. So they gather more data, train larger models, tune them for weeks, but nothing ends up in production.
What is the value of software that doesn’t run in production? Zero? Too much. Such software has negative value. You spent money building it, and it doesn’t make money.
The solution to the perfectionism problem is user story mapping. It is a technique of splitting huge features into incremental steps. Each step, when deployed, delivers value to the user.
Usually, you don’t need the complete solution right away. You can deploy an MVP and gradually extend it. If people can’t use a half-done feature, I suggest adding feature toggles. You can still deploy the code in production, show the progress to a selected group of users, and don’t affect everyone else’s workflow.
You may also like
- How to run Airflow in Docker (with a persistent database)
- AI in production: make data as easy as using your phone
- Dependencies between DAGs: How to wait until another DAG finishes in Airflow?
- Row number in Apache Spark window — row_number, rank, and dense_rank
- How to scrape a single web page using Scrapy in Jupyter Notebook?