How to retrieve the table descriptions from Glue Data Catalog using boto3

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (18/100)

It is not a common use-case, but occasionally we need to create a page or a document that contains the description of the Athena tables we have. It is relatively easy to do if we have written comments in the create external table statements while creating them because those comments can be retrieved using the boto3 client.

In this article, I am going to show you how to do it.

First, we have to create a glue client using the following statement:

1
2
3
4
5
6
import boto3

glue_client = boto3.client('glue',
            region_name=region_name,
            aws_access_key_id=aws_access_key_id,
            aws_secret_access_key=aws_secret_access_key)

To retrieve the tables, we need to know the database name:

1
glue_tables = glue_client.get_tables(DatabaseName=db_name, MaxResults=1000)

Now, we can iterate over the tables and retrieve the data such as the column names, types, and the comments added when the table was created:

1
2
3
4
5
for table in glue_tables['TableList']:
    for column in table['StorageDescriptor']['Columns']:
        column_name = column['Name']
        comment = column.get('Comment', '')
        column_type = column['Type']

We have to remember that the code above does not return the columns used for data partitioning. To get the partition keys, we need the following code:

1
2
3
4
5
for table in glue_tables['TableList']:
    for partition_key in table.get('PartitionKeys', []):
        column_name = partition_key['Name']
        comment = partition_key.get('Comment', '')
        column_type = partition_key['Type']

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group