You’ll find here the concepts used to create Class Cruncher, a handy webapp for school administrators.

# The problem

How should you deploy a certain number of teachers across your school’s classes, given rules about class sizes? You may need to use composite classes.

More precisely, given:

- The total number of teachers
- Which grades are being taught (eg. kindergarten, 1st grade, 2nd grade etc)
- How many children are in each grade
- Min & max class sizes in each grade
- Which composite classes are allowed (eg. grades 1 & 2 but not grades 1 & 3 together)
- The smallest number of children from a grade allowed in a composite class (eg. so you don’t have a single lonely first grader in a class with grade 2).

Then list out the classes that each teacher should take (i.e. how many children from each grade are in each class).

# A solution

Solve this as a mixed-integer linear programming problem, using lpsolve. This uses the simplex and branch-and-bound algorithms to maximise an objective function subject to some constraints. Here we have lots of constraints, and just want a feasible solution, rather than an “optimal” solution, so the objective function is not so important.

## Step 1: Choose your variables

For every potential class j, we’ll have a binary variable y(j) (ie. it must be either 0 or 1), to tell us if it is being used or not.

Because classes can include students from different grades, we will invent the term “subclass” to refer to the number of children from one particular grade in a class. Then our key variables are x(i), the number of children in each potential subclass i.

Let’s relate the classes and subclasses via a matrix C(j,i), with 1 if subclass i is in class j, and 0 otherwise. Since subclasses are only in a single class, for any subclass i, C(j,i) is 1 for only one class j. So C is mostly 0s, with each row having at least one 1, and no columns having more than one 1.

Then the number of children in class j = the sum over i of C(j,i).x(i).

(In matrix notation, Cx = number of children in each class, where x and the right hand side are column vectors.)

C(j,i) also tells us if subclass i is being used: it is given by the sum over j of C(j,i).y(j).

Recall that y(j) tells us if class j is being used. Also, for a given subclass i, C(j,i) is only 1 for a single class j, so the sum is just a single term.

(In matrix notation, C’y = binaries telling if each subclass is used, where C’ is the transpose of C.)

For example: Suppose you have 3 grades (kindergarten, 1 and 2), and can allow for 1 pure kindy class, 2 composite K/1 classes, 2 pure 1st grade and 1 pure 2nd grade class. That’s a total of 6 classes and 8 subclasses. The matrix C is:

—-8 subclasses—– | ||||||||
---|---|---|---|---|---|---|---|---|

| | 1 | . | . | . | . | . | . | . |

| | . | 1 | 1 | . | . | . | . | . |

6 | . | . | . | 1 | 1 | . | . | . |

classes | . | . | . | . | . | 1 | . | . |

| | . | . | . | . | . | . | 1 | . |

| | . | . | . | . | . | . | . | 1 |

To simplify the model, we’ll assume potential composite classes cannot optionally only use some of their subclasses – it’s all or nothing.

So in fact, we also need a preliminary step (before solving) to decide on the number of potential classes and subclasses from the input data. We can make a guess at an upper limit, eg. by dividing the number of children in each grade by the maximum number allowed in a class, and rounding up.

## Step 2: Choose your constraints

Our constraints need to achieve these things:

- Ensure all the children in each grade are in a class (or equivalently, a subclass)
- Classes sizes must be between the minimum and maximum class size – or zero (since not all potential classes need to be used!)
- Subclass sizes must be above the minimum subclass size, if the class is being used
- The number of classes must match (or be less than) the number of available teachers

We also need to relate the x and y variables. In a linear program, all the constraints need to be linear in the variables z. This means you can express them as matrices A, so that A.z is greater than, equal to, or less than, another vector c. Here z is just a column vector containing both x and y.

Take each of the above in turn.

### Ensure all the children in each grade are in a class (or equivalently, a subclass)

We have to express this in terms of the number of children in the subclasses x, because the y’s don’t tell us how many children there are in each class.

In our earlier example, the first point involves 3 constraints, one for each grade. Multiply the below matrix with the column vector x (which has 8 rows).

—-8 subclasses—– | sums to | ||||||||
---|---|---|---|---|---|---|---|---|---|

| | 1 | 1 | . | 1 | . | . | . | . | = number of kindergarteners |

3 grades | . | . | 1 | . | 1 | 1 | 1 | . | = number of first graders |

| | . | . | . | . | . | . | . | 1 | = number of 2nd graders |

### Classes sizes must be between the minimum and maximum class size, or zero

If we had to use every potential class (ie. all the y variables were known to be 1), then this is just the constraint Cx >= min_class_size, Cx <= max_class_size (where I’m applying the inequality to each row of each side, so these are 6+6=12 constraints in our example.)

However, Cx >= min_class_size should only apply if y is 1. That’s easy to write: Cx – min_class_size.y >= 0. So the constraint matrix has C (N_classes × N_subclasses or 6×8) on the left and -min_class_size (N_classes × N_classes or 6×6, a diagonal matrix of the minimum class sizes) on the right.

I make the same modification for the max class sizes, although it’s not strictly necessary; this has the effect of forcing y >= 0.

### Subclass sizes must be above the minimum subclass size, if the class is being used

If we had to use every potential class (ie. all the y variables were known to be 1), and therefore every subclass, then this is a very simple set of constraints: x >= min_subclass_size. However, we only want this constraint to apply if the corresponding binary value is 1. That corresponding y is actually the corresponding row of C’y (as we discussed earlier).

There’s a classic trick to only apply constraints based on the value of a binary variable, called the “big M” method.

Write x >= min_subclass_size as (x – min_subclass_size) >= 0, and then replace the right hand side with M(C’y – 1), where M is a big number (eg. 10000). Now when C’y is 1, the constraint applies; when it is 0, it does not. Cool!

### The number of classes must match (or be less than) the number of available teachers

The last constraint is easy – the sum of the y’s must be less than or equal the number of teachers; the x’s don’t come into it.

### Do we need to explicitly constrain y to be binary?

Certainly all our variables need to be integers, not floating point. But y already can’t be bigger than 1, because our big M constraint would fail. And it can’t be less than 0 because of the way we wrote our max class size constraint.

So y is already forced to be binary.

### Matrix representation of the constraints

A: (integer vars) (binary vars) b: -- num_subclasses -- -- num_classes -- (--------------------+-----------------) (--------------) ( | ) | ( | ) ( grade_size_matrix | 0 ) num_grades = ( grade_sizes ) ( | ) | ( | ) (--------------------+-----------------) (--------------) ( |-l1 ) | ( | ) ( class_size_matrix | (diag.) ) num_classes >= ( 0 ) ( | -lnc) | ( | ) (--------------------+-----------------) (--------------) ( |-u1 ) | ( | ) ( class_size_matrix | (diag.) ) num_classes <= ( 0 ) ( | -unc) | ( | ) (--------------------+-----------------) (--------------) ( 1 | ) | ( | ) ( 1 | ) | ( | ) ( ... | -M * C' ) num_sub_classes>=( ls - M ) ( 1 | ) | ( | ) ( 1 | ) | ( | ) (--------------------+-----------------) (--------------) ( 0 | 1 1 ... 1 1 ) num_classes =,<= ( max_classes ) (--------------------+-----------------) (--------------) where l is the min_class_size u is the max_class_size ls is the min_subclass_size C' is the transpose of the class_size_matrix M is a large number

## Step 3: Choose the objective function

For a single answer, I just minimise the sum of all the x(i).

Ideally you could get second-best and other feasible solutions from the solver, but I had trouble doing this. As a work-around, I produce different solutions by changing the weights on each subclass. I raise the weight from 1 to 2 for the subclasses in a given grade or composite class, and repeat.

## Step 4: Express the solution in an understandable way

We recast the resulting x and y values so that people can understand them, eg. in our example with 8 subclasses we might get an answer like

--------------- x ----------- ------- y ------- [20, 12, 10, 0, 0, 25, 22, 21, 1, 1, 0, 1, 1, 1]

This is more easily understood as being 5 classes with the following compositions:

K 1st 2nd class #1: 20, 0, 0 class #2: 12, 10, 0 class #3: 0, 25, 0 class #4: 0, 22, 0 class #5: 0, 0, 21

## Step 5: Post-process to humanise the answers

The optimisation produces often solutions with very different numbers of students in each class, eg. two kindergarten classes, one with 10 students, and one with 20. Clearly, a better solution is two classes of 15.

As is often the case, the constraints we thought we needed don't quite fully give us the solutions we expected.

We could tackle this by adding more constraints or changing the objective function, but in this case it is simpler to do some "shuffling" after the optimisation, according to some prescribed rules.

Composite classes make this a bit harder. For another example, given the formatted output (note the composite classes are listed at the end):

[[10,0,0], [20,0,0], [0,25,0], [0,0,10], [25,5,0], [0,5,25]]

We could return:

[[21,0,0], [21,0,0], [0,22,0], [0,0,20], [13,8,0], [0,5,15]]

The rules I came up with took some discovering:

- For each grade, find the average class size of classes with subclasses from that grade. (In the above example, it's 20, which includes 5 grade 1s in the comp class)
- For each grade, move that grade's students around to make them as close to this average as possible, eg:
- kindergarten: avg = 20;
`[[20,0,0], [20,0,0], [0,25,0], [0,0,10], [15,5,0], [0,5,25]]`

- grade 1: avg = (25+20+30)/3 = 25; in the example, this cannot be done, since the last class is the problem.
- grade 2: avg = (10+30)/2 = 20,
`[[20,0,0], [20,0,0], [0,25,0], [0,0,20], [15,5,0], [0,5,15]]`

- kindergarten: avg = 20;
- Repeat until class sizes only change by 1, eg:
- kindergarten: avg = 20, no shuffling required
- grade 1: avg = (25+20+20)/3 = 21.7;
`[[20,0,0], [20,0,0], [0,22,0], [0,0,20], [15,7,0], [0,6,15]]`

- grade 2: avg = (20+21)/2 = 20.5, no change required
- kindergarten: avg = (20+20+22)/3 = 20.7,
`[[21,0,0], [20,0,0], [0,22,0], [0,0,20], [14,7,0], [0,6,15]]`

- etc

## Step 6: Build it!

Build a web app to take the inputs, perform the optimisation and serve the results!

Will this algorithm allow more than one teacher per classroom. If I have 40 students and two teachers, that’s a 1:20 ratio. Having two teachers per classroom helps role model adult rapport. Good for math teaching!