How to make sure that you did not leave an EMR cluster running

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (40/100)

How many times do you wonder whether you switched off the light before leaving the house? What about terminating the EMR cluster? Is it still running? Are you paying for something you are not using?

This article will show you how to create a script that periodically checks whether the EMR cluster is running and displays a notification in macOS. I will use osascript, so this script does not work on Windows and Linux!

The first thing we have to do is defining the naming convention. In my team, we assume that a personal dev cluster must have the owner’s username in its name. On macOS, we can get the username using the whoami command. We will use that later.

Before proceeding, you should also install and configure AWS CLI.

Script to check whether a cluster is running

This section will show you a few lines of code and explain how they work. In the end, I will post the whole script.

First, we have to retrieve the list of all running clusters that have the username of the current user in its name:

1
running_clusters=$(aws emr list-clusters --active | tail -n +2 | grep $(whoami))

After that, we must count the number of lines in the output:

1
number_of_lines=$(echo $running_clusters | sed '/^\s*$/d' | wc -l)

If there is at least one line, we have a running EMR cluster. In that case, we run an osascript that displays a notification:

1
2
3
4
5
if (( $number_of_lines > 0 )); then
    message="You have an EMR cluster running"
    script="'display notification \"$message\" with title \"EMR cluster\"'"
    eval "osascript -e $script"
fi

Here is the complete script:

1
2
3
4
5
6
7
8
9
#!/bin/bash

running_clusters=$(aws emr list-clusters --active | tail -n +2 | grep $(whoami))
number_of_lines=$(echo $running_clusters | sed '/^\s*$/d' | wc -l)
if (( $number_of_lines > 0 )); then
    message="You have $(echo $number_of_lines | tr -d ' ') cluster running"
    script="'display notification \"$message\" with title \"Dev clusters\"'"
    eval "osascript -e $script"
fi

Now, you have to make this file executable and try to run it. If you have an EMR cluster running which name matches the naming convention, you should see a notification.



Use CRON to run the script

In the final step, we have to add the script to the crontab and run it periodically. I like to run it on the 25th and 55th minute of every hour:

1
sudo crontab -u $(whoami) -e

In the crontab, you should add a new line which contains this:

1
25,55 * * * * ~/script_path/script_name.sh

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group