# How to assign people to groups in a fair way using genetic algorithms

Some time ago, we had an exciting challenge. Around 80 developers had to be assigned to groups/teams of 8-10 people, and we wanted to have 9 such teams.

Every one of the developers could choose to which team they want to be assigned, but due to the constraints (the limited number of groups and the expected size of the group), it was not easy.

# Systematic consensus and resistance voting

The problem has been solved by using the method called systematic consensus. In a nutshell, we wanted to achieve the state that is the least problematic to everyone.

First, we started with resistance voting. In that method of voting, every person assigns scores to their choices, but the score indicates how much they *don’t* want it.
People can assign scores between 0 and 10, but 10 has a special meaning. It means “no way,” “if you assign me there, I quit,” etc.

When the voting finished we had a score table which looked like this:

team A | team B | team C | team D | team E | team F | team G | team H | team I | |
---|---|---|---|---|---|---|---|---|---|

person 1 | 0 | 4 | 2 | 6 | 10 | 10 | 10 | 4 | 3 |

person 2 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 0 | 10 |

person 3 | 0 | 2 | 10 | 4 | 10 | 10 | 6 | 10 | 8 |

person 4 | 3 | 3 | 3 | 2 | 1 | 2 | 2 | 4 | 2 |

After voting, the people were assigned to the groups. I don’t know what method was used to assign them, but in this article, I am going to show you how to do it using genetic algorithms.

# People gaming the system

Quickly we realized that the resistance voting could be easily gamed. Some people were interested in joining only one team, and everything else was unacceptable for them (like “Person 2” in the example above)

In my opinion, this problem can be solved in at least two ways.

## No duplicate votes

The first solution is to do not allow duplicate votes. People can vote “10,” but only once. Additionally, it solved the issue caused by people like “Person 4” — people who don’t seem to have an opinion. Saying “I don’t care” is not being flexible and ok with every possible outcome, but instead violates the principles of this method of selecting teams.

## Exponential score

The second solution is purely mathematical and does not require changing rules or repeating the voting.

I want a method which “punishes” people for not being cooperative and overusing high values, but at the same time I want the emphasize the value of a “10” given by a person who votes “10” only once.

First, I need to standardize votes of every person separately. I am going to calculate the standard deviation of scores, and then convert them to z-score.

For example, in the case of “Person 1,” I have the following scores: 0, 4, 2, 6, 10, 10, 4, 3. The standard deviation of those scores is 3,778594683 and the mean: 5,444444444.

I am going to slighly modify the formula for the z-score because I don’t want a person who voted zero 10 times to break it, so my z-score is:

\[ZScore_i = \frac{x_i - mean(X)}{(1 + sd(X))}\]X - are all votes of that person, \(x_i\) is the i-th vote of the person.

This kind of solves the problem of overusing 10s, but the result values are still very similar. There is a small difference between the z-score of a person who voted 10 only once and the abuser of 10s. Additionally, I have to deal with negative values.

Exponential function solves both of those problems.

\[expScore_i = e^{ZScore_i}\]Now, I don’t have negative values, and 10s given by abusers have a much smaller impact than a 10 given by a more cooperative person.

# Genetic algorithms as a solution

I have recalculated the scores or changed the rules of voting, but there is a considerable part still missing. I have to assign people to the teams.

In my previous blog post, I described step by step how to use Jenetics and Helisa in Scala, so please take a look at it if anything is confusing.

First, I must encode the developers’ preferences. From the perspective of the algorithm, it does not matter if I use the raw votes, the exponential score, or any other method. The preferences of every person are going to be an Array of numbers.

1
2
3
4
5
6
7
8
9
10

def randomPerson(): Array[Int] = Range(0, 9).map(_ => Random.nextInt(11)).toArray
val preferences: Array[Array[Int]] = Array(
Array(0, 4, 2, 6, 10, 10, 10, 4, 3),
Array(10, 10, 10, 10, 10, 10, 10, 0, 10),
Array(0, 2, 10, 4, 10, 10, 6, 10, 8),
Array(3, 3, 3, 2, 1, 2, 2, 4, 2)
) ++ Range(0, 76).map{ _ => randomPerson()}
def preference(person: Int, group: Int): Int = preferences(person)(group)

When we use a genetic algorithm, the solution is described by a “genotype.”

In my case, I have 80 people who can choose one of 9 groups, so I encoded the solution as an array of 80 numbers between 0 and 8. The position in the array denotes the person, and the value indicates to which group the person has been assigned.

1
2
3

val genotype = () => genotypes.uniform(
chromosomes.int(min = 0, max = 8, length = 80)
)

Because of the way the solution is encoded, my fitness function takes a List of Integers as an argument and returns an Integer. Calculating the fitness score consists of two parts. First, I have to sum the resistance scores of all developers. Second, I must check if the groups have between 8-10 people.

I am going to sum the resistance scores of people assigned to groups. To put emphasis on the group size, I am going to raise 10 to the power of the number of unmet size constraints in every group and add the results to the resistance score.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

def fitness(solution: List[Int]): Long = {
val idealSizeOfGroup = 9
val peopleResistanceScore = solution
.zipWithIndex
.map {
case (group, person) => preference(person, group)
}
.sum
val groupSizeResistanceScore = solution
.groupBy(identity)
.mapValues(peopleInTheGroup =>
//if you used exponential score, you should change the base to 100
Math.pow(10, Math.abs(peopleInTheGroup.length - idealSizeOfGroup)))
.values
.sum
peopleResistanceScore.toLong + groupSizeResistanceScore.toLong
}

Because Helisa provides only implicit decoders for case classes and I used a List as the parameter of my fitness function, I have to define my decoder which converts an IntegerGene to a List of Ints.

1
2
3
4
5
6
7
8
9
10
11

import com.softwaremill.helisa.api.convert.Decoder
import io.jenetics.IntegerGene
implicit def decoder = new Decoder[List[Int], IntegerGene] {
import scala.collection.JavaConverters._
def decode(genotype: Genotype[IntegerGene]): Option[List[Int]] = {
val list = genotype.iterator().asScala.flatMap(_.toSeq.asScala.map(_.getAllele.toInt)).toList
if(list.isEmpty) None else Some(list)
}
}

When all of that is ready, I can run the Evolver to get the solution.

1
2
3
4
5
6
7
8

val evolver = Evolver(fitness, genotype)
.minimizing()
.populationSize(500)
.survivorsSelector(selectors.tournament(20))
.survivorsSize(20)
.offspringSelector(selectors.rouletteWheel())
.geneticOperators(operators.crossover.singlePoint(), operators.mutator.swap())
.build()

1
2
3
4
5
6
7
8
9
10
11
12
13
14

val best = evolver.streamScalaStdLib().drop(5000).head
Console.println(best.bestFitness)
val bestSolution = best.bestPhenotype.head
Console.println(
bestSolution
.groupBy(identity)
.mapValues(_.length)
.toList
.sortBy(_._1)
.mkString(",")
)

In my case, the best solution produced by the genetic algorithm was:

281

(0,9),(1,9),(2,9),(3,9),(4,9),(5,9),(6,8),(7,9),(8,9)

Did you enjoy reading this article?

Would you like to learn more about software craft in data engineering and MLOps?

**Subscribe to the newsletter** or **add this blog to your RSS reader** (does anyone still use them?) to get a notification when I publish a new essay!

You may also like

## Bartosz Mikulski

- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz