How to download all available values from DynamoDB using pagination

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (41/100)

A common problem I noticed in various applications was forgetting that DynamoDB supports pagination too. Somehow, when developers see more than ten results, they assume that they receive everything ;)

How do we retrieve all values from DynamoDB when performing a query?

We have to extract the LastEvaluatedKey from the response and use it as the ExclusiveStartKey in the subsequent query. In this article, I show how to do it when we use the AwsDynamoDBHook in Airflow:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from boto3.dynamodb.conditions import Attr
from airflow.contrib.hooks.aws_dynamodb_hook import AwsDynamoDBHook

query_params = {
    'FilterExpression': Attr('some_field').eq('value'), 'ConsistentRead': True
}

hook = AwsDynamoDBHook('primary_key_name', 'table_name', 'aws_region')
connection = hook.get_conn()
table = connection.Table('table_name')

response = table.scan(**query_params)

entries = list()

for item in response['Items']:
    entries.append(item)

while 'LastEvaluatedKey' in response:
    response = table.scan(**query_params, ExclusiveStartKey=response['LastEvaluatedKey'])
    for item in response['Items']:
        entries.append(item)

output = iter(entries)



Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group