How to download all available values from DynamoDB using pagination

A common problem I noticed in various applications was forgetting that DynamoDB supports pagination too. Somehow, when developers see more than ten results, they assume that they receive everything ;)

How do we retrieve all values from DynamoDB when performing a query?

We have to extract the LastEvaluatedKey from the response and use it as the ExclusiveStartKey in the subsequent query. In this article, I show how to do it when we use the AwsDynamoDBHook in Airflow:

from boto3.dynamodb.conditions import Attr
from airflow.contrib.hooks.aws_dynamodb_hook import AwsDynamoDBHook

query_params = {
    'FilterExpression': Attr('some_field').eq('value'), 'ConsistentRead': True
}

hook = AwsDynamoDBHook('primary_key_name', 'table_name', 'aws_region')
connection = hook.get_conn()
table = connection.Table('table_name')

response = table.scan(**query_params)

entries = list()

for item in response['Items']:
    entries.append(item)

while 'LastEvaluatedKey' in response:
    response = table.scan(**query_params, ExclusiveStartKey=response['LastEvaluatedKey'])
    for item in response['Items']:
        entries.append(item)

output = iter(entries)
Older post

How to make sure that you did not leave an EMR cluster running

How to get notifications about running EMR cluster

Newer post

How to determine the partition size in Apache Spark

How to choose the proper partition size and the number of partitions to run an Apache Spark job