Using Abstraction Layers to Tackle Common Problems with Legacy Code

As a programmer, one of the challenges you may encounter is working with unreliable code. This type of code can be difficult to understand, error-prone, and time-consuming. It can also pose risks to the stability of the systems you work on.

Creating an abstraction layer to isolate unreliable code from the rest of the system is an effective strategy for making code changes. Creating a stable interface between the unreliable code and the rest of the system can prevent errors from propagating and causing broader problems. Component isolation is critical in larger and more complex systems, where the effects of unreliable code can be harder to predict and manage. Additionally, abstraction layers can make it easier to maintain and modify the system over time, as they provide a clear separation between different system parts.

How to create an abstraction layer

Creating abstraction layers involves identifying the problematic code and designing a layer that provides a stable interface for other parts of the system. In Python, we can do this by wrapping the unreliable code in a new function or class.

Suppose we are working with a REST API that provides access to some data but is unreliable and sometimes returns errors even when the request is valid. We can define a new class around the API to provide a more stable and predictable interface. This class could handle retrying the request automatically if an error is encountered and provide additional error handling and input validation. In our example, the SafeApiClient class uses the retry decorator to retry the request if an error is encountered automatically:

1
2
3
4
5
6
7
8
9
10
11
12
13
import retry

class SafeApiClient:
    def __init__(self, api_url):
        self.api_url = api_url

    @retry(tries=3)
    def get_data(self, resource, params={}):
        response = requests.get(self.api_url + resource, params=params)
        if response.status_code == 200:
            return response.json()
        else:
            raise ValueError(f"API returned status code {response.status_code}")

Other benefits of isolating unreliable code

Additionally, abstraction layers can make it easier to maintain and modify the system over time. If all interactions with faulty code go through our abstraction layer, we have complete control over the dependencies we pass and can track how the defective code is used. By providing a clear separation between the unreliable code and the rest of the system, we can easily understand the dependencies and relationships between different parts of the system.

We can use abstraction layers in various situations to improve a system’s reliability and maintainability. For example:

1. Integrating third-party libraries

When using third-party libraries in a project, we can prevent them from spreading throughout the entire application by creating a wrapper interface. Additionally, an abstraction layer lets us replace the third-party library without changing our code.

Of course, there is a risk of overdoing it. I was unlucky to work on a project (as an external contractor) in which someone wrote a wrapper around Apache Spark, and it was forbidden to interact with Spark API directly. It was painfully slow, buggy, and had no purpose besides making one “senior” (by length of employment, not skill) programmer feel important.

2. Working with legacy code

Legacy code is often unreliable and difficult to maintain and can pose a significant challenge to programmers. By creating abstraction layers around legacy code, we isolate the problematic code and protect the rest of the system from its potential adverse effects. I recommend doing it, especially when nobody understands what happens inside the legacy code. If it has to stay in the project, hide it and protect the rest from its bugs.

3. Managing asynchronous code

Asynchronous code tends to take over the codebase. As soon as you have asynchronous operations, you start mixing threads, Futures, Promises (or whatever name your programming language uses for the data structure that will contain the value at some time later) with business logic. While Futures/Promises are an abstraction layer over threads, we can hide asynchronous code even more.

4. Dealing with network or I/O operations

Like in our example, we can create an abstraction layer to deal with network problems or unstable external services.

However, doing it for every I/O operation is probably not a good idea. In the same project where someone created a wrapper for Apache Spark, we had many functions with the suffix with_retry. When you join a new project, and half of the existing code has such names, it sets the right expectations regarding code quality — very, very low expectations.

Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?

Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!

Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.

Bartosz Mikulski

Bartosz Mikulski

  • Data/MLOps engineer by day
  • DevRel/copywriter by night
  • Python and data engineering trainer
  • Conference speaker
  • Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
  • Twitter: @mikulskibartosz
  • Mastodon: @mikulskibartosz@mathstodon.xyz
Newsletter

Do you enjoy reading my articles?
Subscribe to the newsletter if you don't want to miss the new content, business offers, and free training materials.