How to find and terminate an idle Redshift session

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (22/100)

Redshift has a limit of 500 connections. If we use Redshift as a data source of a user-facing application, we can easily exceed that limit. In this case, we end up with a terrible user experience and hard to debug problems (I bet that exceeded connection limit is not going to be the first possible cause we will look at while debugging).

In this article, I show how to find idle Redshift sessions and how to terminate them. First, I will show you the full query that returns only the idle sessions. In the second part of the article, I will explain it step-by-step.

How to get idle user sessions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
SELECT
    sessions.process AS process_id,
    trim(sessions.user_name) AS user_name,
    sessions.starttime AS session_start_time,
    datediff(s, sessions.starttime, sysdate) AS session_duration_is_seconds,
    queries.last_end AS last_query_end_time,
    datediff(s, CASE WHEN queries.last_end IS NOT NULL
        THEN queries.last_end
        ELSE sessions.starttime END
    , sysdate) AS idle_duration_in_seconds
FROM
        (SELECT starttime, process, u.usesysid, user_name
        FROM stv_sessions s, pg_user u
        WHERE s.user_name = u.usename
        AND u.usesysid>1
        AND process NOT IN (
            SELECT pid FROM stv_inflight WHERE userid>1
            UNION SELECT pid FROM stv_recents WHERE status != 'Done' AND userid>1
        )) sessions
        LEFT OUTER JOIN (
            SELECT userid, pid, max(endtime) AS last_end FROM svl_statementtext
            WHERE userid>1 AND sequence=0
            GROUP BY 1,2) queries
            ON sessions.usesysid = queries.userid AND sessions.process = queries.pid
        WHERE (queries.last_end > sessions.starttime OR queries.last_end IS NULL)
        ORDER BY idle_duration_in_seconds DESC;

How to terminate a session

To terminate an active user session, we need the process_id that identifies the session. When we have the process id, we can use the pg_terminate_backend( process_id ) function to disconnect the user from Redshift.



How does it work?

Find active sessions

We start by retrieving all active user sessions. For that, we can use the stv_sessions Redshift table:

1
2
3
4
SELECT starttime, process, u.usesysid, user_name
FROM stv_sessions s, pg_user u
WHERE s.user_name = u.usename
AND u.usesysid > 1

Find active queries

We are not interested in sessions of users who are currently executing a query, so we filter out in-flight queries and recently finished queries. Hence, the process NOT IN statement that removes the sessions found by this query:

1
2
3
SELECT pid FROM stv_inflight WHERE userid>1
UNION
SELECT pid FROM stv_recents WHERE status != 'Done' AND userid>1

Which session is idle?

To figure out which sessions are idle, we need the end time of the last statement executed using those sessions. We can get this information from the slv_statementtext table:

1
2
3
4
SELECT userid, pid, max(endtime) AS last_end
FROM svl_statementtext
WHERE userid>1 AND sequence=0
GROUP BY 1,2

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group