Even data engineering teams have to implement backend services with REST API. In this article, I show how to test such services in Python using a BDD implementation - Behave.

It doesn’t matter what technology I’ve used to write the underlying service. I use it as an opaque box, so all I can do is calling the API and checking what data I receive. How do we name such testing? Some people call it end-to-end testing. Others prefer calling it integration tests or API contract tests. Let’s say it is an API contract test because I am interested in checking whether the API behaves as expected by the clients.

There are a few consequences of testing the API contract of an opaque box. First of all, I cannot mock any dependencies or directly examine the underlying database. All of my tests must use the REST API, and I must pretend I know nothing about the service implementation.

In this article, I’ll skip deployment and test environment setup. Let’s assume I’ve already deployed the application and populated a test database with the data I need.


To run the tests, we need a few dependencies. I use poetry as the dependency manager, so my dependencies look like this:

behave = "^1.2.6"
pytest = "^6.2.2"

Defining the test scenarios

The application we test is a Twitter clone. First, I have to write test scenarios using Gherkin. It is not an article about writing Gherkin scenarios, so I implement only two scenarios:

Feature: User feed contains tweets posted by followed users

  #Rule: User's feed contains tweets posted by people followed by the user

      Scenario: User who doesn't follow anyone doesn't see tweets
        Given Alice doesn't follow anyone
        When Alice retrieves the feed
        Then Alice sees an empty list

      Scenario: User sees tweets posted by followed accounts
        Given Alice follows Bob
        And Bob posted a tweet
        When Alice retrieves the feed
        Then Alice sees the content posted by Bob

When we use Behave, we store the scenarios in a .feature file in the features directory.

Implementing the scenarios

Now, I implement the BDD steps. We must store the step implementations in the feature/steps directory.

In the beginning, I have to produce random tweets. I prefer to use random values here because leftovers from the previous tests won’t spoil the results even if I don’t clean the test environment between tests. Still, it is better to redeploy everything and purge the test database.

import random
import string

from behave import *

from features.steps.api_client import *

def _random_string(n):
    return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(n))

For the first scenario, we need three functions:

@given("Alice doesn\'t follow anyone")
def alice_unfollows_all(context):

@when("Alice retrieves the feed")
def alice_retrieves_feed(context):
    context.feed = retrieve_tweets('Alice')

@then("Alice sees an empty list")
def feed_is_empty(context):
    assert len(context.feed) == 0

We print the feed content in the then implementation because Behave captures the standard output and prints it when a test fails.

We need to implement the two helper functions, unfollow_all and retrieve_tweets. The actual implementation doesn’t matter, so I do not post the implementations of other helper functions.

import requests

def _make_url(URL):
    pass # here we should return the URL to the test environment

def _get_pwd(user):
    pass # here we return the test password of the test account

def unfollow_all(username):
    requests.delete(_make_url('following'), auth=(username, _get_pwd(username)))

def retrieve_tweets(username):
    response = requests.get(_make_url('tweets'), auth=(username, _get_pwd(username))
    return response.json()

In the second scenario, we have to post a tweet and see whether we get it while retrieving the feed:

@given("Alice follows Bob")
def alice_follows_bob(context):
    follow('Alice', 'Bob')

@given('Bob posted a tweet')
def bob_posts_tweet(context):
    context.expected_tweet = _random_string(20)
    post_tweet('Bob', context.expected_tweet)

# we reuse the alice_retrieves_feed function

@then('Alice sees the content posted by Bob')
def alice_sees_bob_post(context):
    filtered_feed = [x for x in context.feed if x['username'] == 'Bob' and x['tweet'] == context.expected_tweet]
    assert len(filtered_feed) == 1

Why do we use high-level testing?

What is the benefit of testing service through its API without mocking any dependencies? At first, it seems to be a useless testing method because when a test fails, we don’t know the reason for the failure. Of course, API contract tests are not enough. In addition to them, we need lots of unit tests to verify the internals of the application.

The API contract tests, however, give us other benefits:

  • we avoid an embarrassing situation when all unit tests pass, but the application doesn’t work which happens when we test individual components, but we fail to verify interactions between components
  • we’ll not break the client application. If we make a modification and unintentionally change the API, we have a chance to detect it using the API contract tests before we deploy it and break the contract.
  • we can provide multiple implementations of the same API that are interchangeable and compatible with each other - it is useful when we write a new version of a service or split a large application into smaller services
Older post

Testing data products: BDD for data engineers

How to use BDD to test PySpark code

Newer post

Anomaly detection in Airflow DAG using Prophet library

How to detect problems in Airflow pipeline using Prophet for time series anomaly detection