How to concatenate multiple MySQL rows into a single field?

This article is a part of my "100 data engineering tutorials in 100 days" challenge. (15/100)

In the previous article, I showed how to group rows into an array or a set in Hive. This blog post demonstrates how to achieve a similar result in MySQL.

As in the aforementioned article, we have the following table:

1
2
3
4
5
6
7
C1  C2
AAA AA1
AAA AA2
AAA AA2
BBB BB1
BBB BB2
CCC CC1

and we would like to get this result:

1
2
3
AAA AA1, AA2, AA2
BBB BB1, BB2
CCC CC1

In MySQL, instead of collect_list or collect_set functions, we are going to use the GROUP_CONCAT function. We have to group the rows by the C1 column and concatenate the C2 column:

1
2
3
SELECT C1, GROUP_CONCAT(C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;

If we wanted to remove the duplicate values, we would have to add the DISTINCT operation to make the query look like this:

1
2
3
SELECT C1, GROUP_CONCAT(DISTINCT C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;

Subscribe to the newsletter and join the free email course.

We can use ORDER BY!

In MySQL, sorting the C2 values is trivial. All we have to do is adding the ORDER BY clause inside the GROUP_CONCAT function:

1
2
3
SELECT C1, GROUP_CONCAT(C2 ORDER BY C2 ASC SEPARATOR ', ')
FROM some_table
GROUP BY C1;

Beware the memory limit!

Unfortunately, the GROUP_CONCAT function will not return a string longer than the default value of the group_concat_max_len parameter (1024 bytes). We can change that by setting the parameter before we execute a select statement that contains the GROUP_CONCAT function:

1
SET group_concat_max_len = 10240;

If we are not sure what is the necessary length that we need, we have to calculate the sum of C2 lengths and the length of the required number of separators:

1
2
3
4
5
6
SET group_concat_max_len = CAST(
    (SELECT SUM(LENGTH(C2)) + (COUNT(*) - 1) * LENGTH(', ')
    FROM some_table 
    GROUP BY C1)
    AS UNSIGNED
);

Remember to share on social media!
If you like this text, please share it on Facebook/Twitter/LinkedIn/Reddit or other social media.

If you want to contact me, send me a message on LinkedIn or Twitter.

Would you like to have a call and talk? Please schedule a meeting using this link.


Bartosz Mikulski
Bartosz Mikulski * data/machine learning engineer * conference speaker * co-founder of Software Craft Poznan & Poznan Scala User Group

Subscribe to the newsletter and get access to my free email course on building trustworthy data pipelines.

Do you want to work with me at riskmethods?

REMOTE position (available in Poland or Germany)