operations research – Danielle Notice

Solving Sudoku with Metaheuristics: GVNS

danielle-notice — Mon, 25 Apr 2022 09:00:00 +0000

Reading Time: 5 minutes

The last time I travelled, I saw a little old lady in the airport with her crossword puzzle book. My grandmother travels with her word search book. Me? In my old age, I will travel with a Sudoku book. Before I hit old age, I will take advantage of the opportunity to combine my studies with my favourite game. As it turns out, heuristic algorithms are pretty popular for solving, creating and rating Sudoku puzzles. In this post, we will look at how a metaheuristic, general variable neighbourhood search, has been used to solve Sudoku puzzle instances.

What is Sudoku?
What are metaheuristics?
General Variable Neighbourhood Search
Solving Sudoku

1. What is Sudoku?

Sudoku is Japanese puzzle which consists of an n² × n² grid divided into n² sub-grids each of size n × n. The word Sudoku is a combination of two Japanese words, Su (Number) and Duko (Single) and loosely translates to “solitary number”. n is the order of the puzzle, with n=3 being the most popular.

The objective of Sudoku is to fill each cell in a way that every row, column and sub-square contains each integer between 1 and n² inclusive
exactly once.

Sudoku is an example of a combinatorial optimisation (CO) problem, which is a class of problems whose solution is in a finite or countably infinite set. Constructing or completing a Sudoku puzzle from a partially filled grid are both NP-complete problems. This means that there is no deterministic algorithm which can solve all possible Sudoku problem instances in polynomial time. The solution space for an empty 9 × 9 Sudoku grid contains approximately 6.7 × 10²¹ possible combinations. However, the pre-filled cells serve as constraints and reduce the number of possible combinations.

2. What are metaheuristics?

When it comes to solving optimisation problems, there are 2 main types of approaches: exact methods and approximate methods. Exact methods are guaranteed to find an
optimal solution for every finite problem instance of an CO problem.

Approximate methods such as heuristic algorithms can be used when there is no known exact method that can be used to solve the problem or when the known exact methods are too computationally expensive to be used practically. In the context of optimisation problems, a heuristic is a well-defined intelligent procedure – based on intuition, problem context and structure – designed to find an approximate solution to the problem. Unlike exact methods, the solutions found may not be optimal, but are some type of “acceptable”. The effectiveness of a heuristic depends on the quality of the approximations that it produces.

The performance of heuristics can be improved using metaheuristics, which are high-level,
problem-independent strategies used to develop heuristic optimisation algorithms. They are designed to approximately solve a wide range of problems without needing to fundamentally change.

3. General Variable Neighbourhood Search

Variable neighbourhood search (VNS) algorithms, which were originally proposed by , are single-solution based metaheuristics. They successively explore a set of predefined neighbourhoods which are typically increasingly distant from the current candidate solution.

Illustration of the main idea of a basic VNS algorithm

VNS’ main cycle is composed of three phases: shaking, local search and move.

In the shake phase, a random solution ��′ is selected from the kth neighbourhood of the current solution s. This ��′ is then used as the initial solution for the local search algorithm being used which produces a new candidate solution ��′′. In the move phase, if ��′′ is better than the current solution s, then it replaces s and the cycle is restarted with this new solution; otherwise, the cycle is restarted with the same solution but a different neighbourhood.

Variable Neighbourhood Descent (VND) is a deterministic variant of the VNS algorithm. The VNS main cycle uses a best improvement method, choosing ��′′ as the local optimum in neighbourhood N_k.

The General Variable Neighbourhood Search (GVNS) uses the VND as the local search procedure (line 7 of the algorithm).

4. Solving Sudoku with GVNS

We will now look at the different elements from the algorithm above and how used this metaheuristic to solve 9 × 9 Sudoku puzzles.

Solution representation: each sub-grid is numbered from 1 to 9, and each cell in a sub-grid is numbered from 1 to 9. So x_ij denotes the jth cell in sub-grid i (see grid above for example of labelled cell).

Solution Initialisation: To initialise the solution, for each cell, a random number is selected
from a list of numbers that include all the numbers that could be assigned to the cell without violating any of the constraints with respect to the fixed cells. This is done in such a way that the sub-grid rule is satisfied. To reduce the solution space, they fixed the
cells that only have one possible value and repeated that until there were no more such cells.

Cost function f(s): evaluates the violation of the row and column constraints and counts how many values are repeated in each row and in each column (illustrated in figure below). The goal is to minimise the cost function. The optimal solution will have f(s)=0.

A candidate solution of the Sudoku puzzle with its fitness value. Repeated digits highlighted in first row and column.

Neighbourhood structures

Only one neighbourhood structure, Invert, is defined for the shake phase. In this structure two cells in the sub-grid are selected and the order of the sub-sequence of cells between them is reversed.

There are 3 neighbourhood structures defined which are used in the VND local search:

Insert – the value of a chosen cell in a sub-grid is inserted in front of another chosen cell.
Swap – the values of 2 unfixed cells in the same sub-grid are exchanged.
A Centered Point Oriented Exchange – a cell between the second and sixth cell in a sub-grid is selected as the center point to find exchange pairs. Values for pair of cells, each equidistant from the center, are swapped until at least one cell in the pair is fixed.

Each of these structures apply to a single sub-grid, and in the local search, the neighbourhoods of each of the sub-grids are explored. Within the VND local search, a deep local search algorithm is used. This uses the best improvement strategy which exploits the whole current neighbourhood structure search area to find the best neighbourhood solution.

Learn More

Metric Learning For Simulation Analytics

danielle-notice — Mon, 11 Apr 2022 09:00:00 +0000

Reading Time: 5 minutes

Usual output analysis of simulations, which is done at an aggregate level, gives limited insight on how a system and its performance change throughout the simulation. To gain greater insight regarding this, you can think of a simulation as a generator of dynamic sample paths. When we consider that we are in the age of “big data”, it’s now pretty reasonable to keep the full sample path data and to explore how to use it for deeper analysis. This can be done in a way that supports real-time predictions and reveals the factors that drive the dynamic performance.

In this post, we’ll look at the emerging field of simulation analytics.

What is simulation analytics?
Metric learning for simulation
A simple example
Some final thoughts

1. What is Simulation Analytics?

The idea of simulation analytics was first described by . It is not just “saving all the simulation data” and then applying modern data-analysis tools. It explores the differences between real and simulated data. Nelson outlines that the objectives of simulation analytics are to generate the following:

dynamic conditional statements: relationships of inputs and system state to outputs; and outputs to other (possibly time-lagged) outputs.
inverse conditional statements: relationships of outputs to inputs or the system state
dynamic distributional statements: full characterization of the observed output behaviour
statements on multiple time scales: both high-level aggregation and individual event times
comparative statements: how and why alternative system designs differ

2. Metric Learning for Simulation

The remainder of this post is a discussion of the by one of my STOR-i colleagues, Graham Laidler and his supervisors.

We can use sample path data available to build a predictive model for dynamic system response. In particular they use k-nearest-neighbour classification of the system state with metric learning to define the measure of distance [1] . In kNN classification, a simple rule is used to classify instances according to the labels of their k nearest neighbours.

From this definition, the paper uses

binary labels $y_i \in \{0,1\}$
instance $x_i$ is the system state at time $t_i$ . More specifically, this refers to some subset of information generated by the simulation up to time $t_i$ .

The classification for an instance $x^*$ is

$\hat{y}^* = \begin{cases} 1, & \text{if} \sum_{i=1}^k y^{*(i)} \geq c \\ 0, & \text{otherwise}, \end{cases}$

where $c \in [0, \infty)$ is some threshold and $y^{*(i)} \text{ for } i = 1\cdots k$ are the observed classification labels that correspond to the k instances nearest to $x^*$ . In words, if c or more of the k nearest neighbours to $x^*$ are observed to be 1, then $y^*$ is classified as 1 by the model.

The discussion then turned to the idea of quantifying the similarity of instances since nearest neighbour classifiers assume that instances that are similar in terms of $x$ are also similar in terms of $y$ . The authors attempt to fully characterise the system by including multiple predictors in their kNN model. Because of the multi-dimensionality of $x_i$ , all variables may not be comparable with respect to scale or interpretation, so using the Euclidean distance is not appropriate.

So we now look at metric learning, which automates the process of defining a suitable distance metric.

The aim of metric learning is to adapt a distance function over the space of $x$ . The paper uses Mahalanobis metric learning which has a distance function parametrized by $M$ , a symmetric positive semi-definite matrix. The metric learning problem is an optimization which minimizes, with respect to $M$ , the sum of a loss function to penalize violations of the training constraints under the distance metric and a function which regularizes the values of $M$ . The metric learning task is subject to similarity constraints, dissimilarity constraints and relative similarity constraints which are set based on prior knowledge about the instances or using the class labels.

3. A simple example

To evaluate the model, the authors create a formulation of the problem. In this formulation, the similarity and dissimilarity constraints are partly based on LMNN[2]. Because of the high-dimensional input, a global clustering of each class may not be appropriate, so a local neighbourhood approach was used when defining these constraint sets. The local neighbourhood of an instance $x_i$ was defined as the q nearest points in Euclidean distance. Points in that local neighbourhood are classified as similar if they had the same $y$ value and dissimilar if they did not. The aim was to minimise the sum of squared distances of instances classified as similar while keeping the average distance of dissimilar instances greater than 1. They set the local neighbourhood size q = 20 and k = 50 nearest neighbours.

One of the illustrations they applied it to was a simple stochastic activity network. The input space was the 5 activity times and the output was whether the longest path length is greater than 5. The activity times were i.i.d $X_i \sim Exp(1)$ . 10000 replications of the network were run. Because the data generating mechanism is exactly known, this example was useful for evaluating the model since the authors understood what the output M should reveal.

The diagonal elements of M indicate the weight given to the difference in each variable in the classification of instances as similar or not. From the results, $X_1, X_3, X_5$ were the most relevant, as was expected from the intuition of the problem. The off-diagonal terms of M indicate impact of interaction terms. Using the 2-5 fold CV, metric kNN model was a better classifier than a logistic regression model.

Visualisation of M (left), ROC curves for the classification (right)

The authors then added noise variables to the model. This makes the model more realistic since multi-dimensional characterizations are likely to include variables that have little or no relationship to the output variable. Metric learning was able to filter out the noise variables while still detecting the relationship between the 5 initial variables.

M for the noise-augmented data (left), ROC curves for classification (right)

4. Some Final Thoughts

I believe this solution is valuable for 2 main reasons:

It proposes a method for more in-depth analysis of simulation results which may be useful for real-time predictions and identifying drivers of system performance. The method is useful for revealing relationships between different components of the system and their effect on performance.
The method allows us to apply kNN on high-dimension input data without the needing to manually trim the state space. This allows analysis to be done without prior knowledge about what variables may or may not be relevant, as they can all be included and the metric learning will reveal the relevance.

Learn More

[1] Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer.

[2] Weinberger, K. Q., and L. K. Saul. 2009. “Distance Metric Learning for Large Margin Nearest Neighbor Classification”. Journal of Machine Learning Research 10(9):207–244.

A new reality TV show idea: the Stable Marriage algorithm

danielle-notice — Mon, 14 Feb 2022 09:00:00 +0000

Reading Time: 3 minutes

As a hopeful romantic, a believer in the principle of marriage and a lover of dating reality TV, I was immediately intrigued by this problem and solution. So to celebrate Valentine’s Day I thought it would be fitting to look at the stable marriage problem.

1. The Premise

Consider two disjoint sets with the same number of elements (for example a group of n men and a group of n women). A matching is a one-to-one mapping from one set onto the other (a set of n monogamous marriages between the men and women). Each man has an order of preference for the women and each woman an order of preference for the men.

A matching is unstable if there exists a possible pairing of a man and a woman (not currently married) that both prefer each other to their spouses. For example, Johnny is married to Bao but prefers Myrla and Myrla is married to Gil but prefers Johnny (IYKYK). Whereas this would making for entertaining TV, the stable marriage problem is to find a matching that avoids this situation.

2. The Pitch

Firstly, it is always possible to find a stable matching in this situation. One possible way to find a solution is the Gale-Shapley algorithm:

First Round

Each man proposes to the woman he prefers the most.
Each woman (if she received any proposals) tentatively accepts her favourite proposal and rejects all the others.

Subsequent Rounds

Each unengaged man proposes to the next woman he prefers the most who has not yet rejected him, regardless of whether she is currently engaged (scandalous!)
Each unengaged woman tentatively accepts her favourite proposal and rejects all the others.
Each engaged woman considers any new proposals and leaves her current partner if she prefers one of the new proposals. She tentatively accepts that better proposal and rejects all the others.

The subsequent rounds are repeated until everyone is engaged.

Example of the Gale-Shapley algorithm (from )

3. A Problem

Important for this algorithm is who makes the proposals – if the men propose, the overall outcome is better for them than for the women. If we score each marriage in the stable matching from both the male and female perspectives based on each person’s preferences and take total score for each gender, you can see a clear difference in the distribution of the scores. The difference is more drastic as the set size is increased.

Distribution of scores for stable matchings when males make proposal using randomly generated preference tables (female scores red, male scores blue)

4. In Practice

While I’ve introduced this problem as a pitch for a dramatic (even if biased) match-making show, Shapley and Roth won a for their applications of this problem and someone did his whole extending some of the ideas.

Here are some interesting situations that this algorithm or some variation of it have been used for in practice:

to transplant patients

Learn more

Gale, D., & Shapley, L. S. (1962). . The American Mathematical Monthly, 69(1), 9–15.