How to concatenate multiple MySQL rows into a single field?

In the previous article, I showed how to group rows into an array or a set in Hive. This blog post demonstrates how to achieve a similar result in MySQL.

As in the aforementioned article, we have the following table:

C1  C2
AAA AA1
AAA AA2
AAA AA2
BBB BB1
BBB BB2
CCC CC1

and we would like to get this result:

AAA AA1, AA2, AA2
BBB BB1, BB2
CCC CC1

In MySQL, instead of collect_list or collect_set functions, we are going to use the GROUP_CONCAT function. We have to group the rows by the C1 column and concatenate the C2 column:

SELECT C1, GROUP_CONCAT(C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;

If we wanted to remove the duplicate values, we would have to add the DISTINCT operation to make the query look like this:

SELECT C1, GROUP_CONCAT(DISTINCT C2 SEPARATOR ', ')
FROM some_table
GROUP BY C1;

We can use ORDER BY!

In MySQL, sorting the C2 values is trivial. All we have to do is adding the ORDER BY clause inside the GROUP_CONCAT function:

SELECT C1, GROUP_CONCAT(C2 ORDER BY C2 ASC SEPARATOR ', ')
FROM some_table
GROUP BY C1;

Beware the memory limit!

Unfortunately, the GROUP_CONCAT function will not return a string longer than the default value of the group_concat_max_len parameter (1024 bytes). We can change that by setting the parameter before we execute a select statement that contains the GROUP_CONCAT function:

SET group_concat_max_len = 10240;

If we are not sure what is the necessary length that we need, we have to calculate the sum of C2 lengths and the length of the required number of separators:

SET group_concat_max_len = CAST(
    (SELECT SUM(LENGTH(C2)) + (COUNT(*) - 1) * LENGTH(', ')
    FROM some_table
    GROUP BY C1)
    AS UNSIGNED
);
Older post

How to get an array/bag of elements from the Hive group by operator?

How to get an array of elements from one column when grouping by another column in Hive