How to emulate temporary tables in Athena

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (96/100)

Currently (I wrote this article in October 2020), Athena does not support temporary tables, but we can easily emulate them using the CREATE TABLE AS SELECT statements:

1
2
3
4
CREATE TABLE some_temp_table 
    WITH (format = 'PARQUET') 
    AS SELECT column_A, column_B, column_C
    FROM source_table;

Unfortunately, we have to remember about removing the table when we no longer need it.

1
DROP TABLE some_temp_table

If we don’t specify the S3 location, Athena will use the default results bucket as the storage location. I think it is good enough in the case of a temporary table.


Subscribe to the newsletter and join the free email course.

Note that I used Parquet as the storage file type. In general, you should pick a file format that is best for the operations you want to perform later. Parquet is a columnar storage file that stores metadata about the content to scan and find the relevant data quickly.

Because the table we create is just a regular table, we can also use partitioning:

1
2
3
4
5
6
7
CREATE TABLE some_temp_table 
    WITH (
        format = 'PARQUET',
        partitioned_by = ARRAY['column_A']) 
    ) 
    AS SELECT column_A, column_B, column_C
    FROM source_table;

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.