How to concatenate multiple MySQL rows into a single field?

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (15/100)

In the previous article, I showed how to group rows into an array or a set in Hive. This blog post demonstrates how to achieve a similar result in MySQL.

As in the aforementioned article, we have the following table:

1
2
3
4
5
6
7
C1  C2
AAA AA1
AAA AA2
AAA AA2
BBB BB1
BBB BB2
CCC CC1

and we would like to get this result:

1
2
3
AAA AA1, AA2, AA2
BBB BB1, BB2
CCC CC1

In MySQL, instead of collect_list or collect_set functions, we are going to use the GROUP_CONCAT function. We have to group the rows by the C1 column and concatenate the C2 column:

1
2
3
SELECT C1, GROUP_CONCAT(C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;

If we wanted to remove the duplicate values, we would have to add the DISTINCT operation to make the query look like this:

1
2
3
SELECT C1, GROUP_CONCAT(DISTINCT C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;


We can use ORDER BY!

In MySQL, sorting the C2 values is trivial. All we have to do is adding the ORDER BY clause inside the GROUP_CONCAT function:

1
2
3
SELECT C1, GROUP_CONCAT(C2 ORDER BY C2 ASC SEPARATOR ', ')
FROM some_table
GROUP BY C1;

Beware the memory limit!

Unfortunately, the GROUP_CONCAT function will not return a string longer than the default value of the group_concat_max_len parameter (1024 bytes). We can change that by setting the parameter before we execute a select statement that contains the GROUP_CONCAT function:

1
SET group_concat_max_len = 10240;

If we are not sure what is the necessary length that we need, we have to calculate the sum of C2 lengths and the length of the required number of separators:

1
2
3
4
5
6
SET group_concat_max_len = CAST(
    (SELECT SUM(LENGTH(C2)) + (COUNT(*) - 1) * LENGTH(', ')
    FROM some_table 
    GROUP BY C1)
    AS UNSIGNED
);

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group