Hamish Thorburn

Choose your own adventure – Defending a target with Stackelberg Security Games

Hamish Thorburn — Tue, 28 Apr 2020 21:01:00 +0000

In today’s CYOA, you are in charge of security for a supermarket. As is the times, dried pasta and toilet paper are currently in high demand, and the store manager has asked that you make sure that these are protected. She’s tells you that while she wants to prevent as much theft as possible, she would also like to determine if the thieves are more interested in the toilet paper or the pasta (as this will help the police catch the criminals). Having taken your instructions, you start to prepare for the first night on the job.

If you fall asleep on the job the first night, go to 1. If you decide to set up some patrols, go to 2.

1.

In a completely unsurprising turn of events, you awake to see all the pasta has gone. Your boss finds out you were asleep, and is furious. You are:

Fired
A moron

Thanks for playing! If you want to try again without being a moron, go to 2.

2.

You decide to set up some patrols between the two aisles. While planning your patrols, you start to realise that this seems a lot like a (SSG).

A SSG is a type of game in which a defender (you) plays against an attacker (the thieves). In this game the attacker will try to attack (i.e. steal from) one of the targets (the toilet paper and pasta). The attacker and defender both have a utility (associated with each target) if an attack is successful (generally positive for the attacker, and negative for the defender). The way the game works is that each turn, the defender picks a strategy to cover/guard each target with a certain probability (which you can think of as the proportion of time each shift you spend patrolling each aisle), then the attacker (seeing this) chooses a target to attack. After reading up on this, you decide to plan your coverage strategy for that night.

If you just decide to patrol the pasta aisle, go to 3. If you just decide to patrol the toilet paper, go to 4. If you decide to patrol them both equally, go to 5.

3.

You spend a few nights just patrolling the pasta, and no one comes near it. But – surprise! – every morning, you find that all the toilet paper is gone.

You try to explain to your boss that you were guarding the pasta, but it’s not good enough. You are fired

Thank you for playing! If you want to try a more sophisticated strategy, feel free to try again!

4.

You spend a few nights just patrolling the toilet paper, and don’t see a soul. But every morning, you find that all the pasta has been taken (in a development you really should have seen coming).

You try to explain to your boss that you were guarding the toilet paper, but it’s not good enough. You are fired.

Thank you for playing! If you want to try a more sophisticated strategy, feel free to try again!

5.

You devise a patrol strategy that covers the two targets with equal probability. And you have some success – you manage to scare off a few attacks. But some are also getting through.

You notice that they always seem to be going for the pasta. You think back to the SSG, and remember that there is a utility for the attacker for a (successful) attack on each target. Assuming they get nothing for attacking a defended target, you realise that their expected utility for attacking a target is:

(1 – prob target is defended) * utility from a successful attack.

You also assume that they will always attack the target with the highest expected utility. Therefore, if you’re covering the two targets equally, then the thieves must prefer to steal pasta to toilet paper. You think you can use this to thwart them.

If you decide to change patrols to just defend the pasta, go to 6. If you decide to gradually increase the probability of defending the pasta, go to 7.

6.

You think the thieves only care about the pasta. Therefore, you can simply defend that, and you’ll prevent all robberies! You switch the patrol to just stay by the pasta, and encounter nothing during the night. Triumphantly, you walk to your bosses office, on the way, passing the toilet paper aisle, which should be completely stoc-

You may have made an error.

Thinking back, you realise that while you were certain that the thieves preferred pasta to toilet paper, you hadn’t actually established that they didn’t care about toilet paper at all.

If you decide to gradually increase the probability of defending the pasta, go to 7.

7.

You gradually increase the probability of defending the pasta, and then (when you’re defending it two-thirds of the time), the thieves go back to stealing the toilet paper. You realise this means that the thieves enjoy pasta twice as much as toilet paper. And this also means that you can’t patrol more effectively than you are now. You take this to your boss and she’s happy. She passes this information onto the police, who round up the thieves using this new piece of evidence (how this helps them is unclear, but you’re pleased you could help).

Congratulations! You’ve unwittingly determined the attackers utility by “…observing the best response of the attacker” (Blum, Haghtalab, Procaccia, 2015). As you build your security business, you start to learn about more sophisticated methods to determine attacker utilities, such as solving linear programs for each target, or using Monte Carlo Tree Searches. But for now, you bask in your success, knowing you have saved the day.

References

Blum, A., Haghtalab, N., & Procaccia, A. D. (2015). Learning to play stackelberg security games. Available . (This post was inspired by Section 1 of this chapter).

60 second stats – Monte Carlo Simulation

Hamish Thorburn — Sun, 19 Apr 2020 14:12:47 +0000

So, the plan for this post was to look at . However, in doing that I realised that:

In order to understand the paper and why it’s important, the reader (i.e. you) need to first understand what a Monte Carlo approximation is… which would be fine, except that;
The only real way I could fit that into the post would be to embed a YouTube video describing it into the start of the post… which would be fine, except that;
Pretty much all the YouTube videos on this are really boring.

Therefore, I thought the best thing to do would be go back to basics, and try and give a quick explanation of Monte Carlo Estimation myself. And what better way to give a quick overview is there than another edition of 60 second stats?

So get out the stopwatch, and get ready. The 60 seconds begins…NOW!

What is Monte Carlo Estimation?

Monte Carlo Estimation is a technique used to estimate quantities. It is based around simulating a bunch of random numbers, and then using these to make estimates.

Why is it called “Monte Carlo”?

It is named after the Monte Carlo Casino, in Monaco, which was frequented by the uncle of Stanislaw Ulam, one the methods founders. It is also a reference to the inherent randomness of the method (as all casino games are based on chance).

Some beautiful architecture that just screams “computational statistics”.

Ok. So how does it work?

The overall idea is very simple. Say you have a random variable X, and you want to estimate some value related to X (e.g. it’s average). You can then simply simulate a large number of realisations (i.e. copies) of X, and take the average of these. Then (because of the ) we know that the average of our copies will be a “good” estimation of X.

Wait, what do you mean by “good”?

What we mean by “good” is that as you take more and more realisations of X to average, the average will get closer and closer to the true value, as shown in the gif below.

The gif clearly shows that the more copies, the closer to the true value we get.

Alright. So how do I do it myself?

It’s easy! All you need is:

A method/program/algorithm to simulate copies of your random variable
Enough computational power/storage to simulate and store many copies of this variable (the more the better)

…and that’s time! Thanks for participating in another edition of 60 second stats. If you want to know more, there are many further aspects to this, such as known results about the , or what to do if you .

Tutorial – Building packages in R

Hamish Thorburn — Sun, 29 Mar 2020 12:38:00 +0000

Slight change of pace in this post – today, we’re going to look at how to build a package in R (which can incorporate C++ code as well). This is based on a lesson we had with Dan Grose, the new software engineer at STOR-i.

For the tutorial today, we assume that everyone is reasonably comfortable using R, and we’ll using a Linux operating system.

An overview of R

If you have used R before, you would know it is a statistical software package and programming language. It is open source (i.e. free) and comes with a variety of built-in statistical functions, such as t-tests, power calculations and plotting functions.

Which have been responsible for many of the graphics in my blog posts

While this is amazing (especially when you’ve previously been doing your t-tests by hand), one of the great aspects of R is that many statisticians around the world will use it to develop functions for more complicated methods (including and ), and release these for public use, in the form of packages. Users then can download and install these packages, and then freely use the functions they contain.

While packages are great for loading other functions written in R, they have another wonderful aspect as well. One of the drawbacks of R is that it can be much slower than other programming languages. However, when building a package in R, you can actually incorporate code written in C++ (the language R is built from), which is MUCH quicker.

Step 1 – building the package skeleton

The first step is to build the “package skeleton”.

No, it is not a skeleton made out of packages

This is simply a folder on your computer in which we’re going to put all our “.r” and “.cpp” files with the functions we want to include. To do this is really simple, if we use the “Rcpp” package in R. This package gives us a command which simply builds the skeleton for us.

To build the skeleton, we simply run the following command in R after loading the Rcpp package:

package.skeleton(“NameOfPackage”,code_files=”ListOfRFiles”,cpp_files=”ListOfC++Files”)

This creates the package skeleton (called “NameOfPackage”) in the working directory. In the skeleton, there are three folders:

man – unimportant for this tutorial
R – a folder where we put all our “.R” files, containing the functions we want in the package, written in R
src – a folder where we put all our “.cpp” files, containing the functions we want in the package, written in C++

A package skeleton of a package named “TestPackage1”

While we can manually put the files in these folders, the package.skeleton function will have automatically placed the files in named in the code_files and cpp_files arguments in the R and src respectively.

Step 2 – building the package

This step is much simpler. Once we have the package skeleton, we simply run the following command from the terminal:

R CMD build PackageDirectory/PackageSkeletonName

substituting the package directory and name in where appropriate.

This builds the package , which can then be sent to and installed on any machine running R.

The package tarball

Step 3 – installing the package

This step is even easier. Just run the following command from the terminal

R CMD INSTALL PackageDirectory/PackageTarBallName

Once this is done, your package can be loaded as a library in R, and is ready to use!

Example – Convex hulls using the Jarvis March Algorithm

As part of our lesson, we created an R package containing functions to find the convex hull of a set of points, using the . In this package, we wrote a function called “ConvexHull” to find the convex hull of the points in C++, and then wrote a function to take a convex hull (the output of our function “ConvexHull”) and the set of points and plot these, which we wrote in R.

All files used to create this package, and the created package tarball can be found . In that folder, you can find the tar ball for the package, which can be installed using the above instructions. If you would like to make the package from scratch yourself, you can do so using the provided file “R_Package_Building_Code.r”. This which provides code to build the package skeleton,

load the package,

generate the points, and find the hull (using a C++ function in the “Hamish_CPP_HM_Functions.cpp” file, in the “JM_Package_CCode/CPP_Code” directory),

and plot this function using a function written in R.

which gives the following result:

The set of points (left) and the points with the hull (right)

And that’s it! While there are some more tricks you can do (like adding to the code), this should be enough to get you started with building packages of your own.

Choose your own adventure – Simulation input uncertainty

Hamish Thorburn — Thu, 05 Mar 2020 13:35:00 +0000

Today’s post will be a choose your own adventure. Follow the prompts and see where you end up!

You’re the star of the story! Choose from 3 possible endings!

In today’s adventure, your a humble graduate data analyst trying to streamline queues in an airport for STORi airways, by choosing the right number of check-in desks to open. Due to recent events, the airline is on the brink of bankruptcy, and so this is a very important task. You aren’t very good at analytical calculations, so you decide to simulate the queue to determine the answer.

You ask your boss for some data on arrival numbers are service times. He gives you the arrival times for 50 arrivals all occuring on one day, and the service time for these arrivals.

Right! Time to crack on! You build you simulation model, and get some results for it. You determine the mean waiting time for customers for different numbers of check-in desks. To be safe, you also calculate a 95% around these waiting times.

You’re about to take these results to your boss when you have a thought – your dataset on arrivals and service times wasn’t very big. What if it was taken on a slow day? Or the day after the Christmas party, so all the check-in staff were a bit sluggish? What if you can’t trust this data that you made all these decisions on?
If you think “Nah, it’s probably fine” and go to your boss anyway, go to 1. If you think, “Hang on, I better think about this a bit more”, go to 2.

1.

You take your results to your boss, and he seems thrilled. He immediately puts your suggestions into practise. You’re the hero of the office – everyone’s looking up to you, there’s talk of a promotion. But then, a few weeks later you get called back to your boss. You go into his office and the CEO is also there. They’re both furious – somehow the number of complaints from customers about waiting times has gone up. You’re shocked – you ran a simulation! How could this have happened? You bosses pull up the new stats on waiting times. The average times are far longer than you suggested. Don’t worry, you prepared for this. You calmly explain to you boss that the averages may be different, but they should still be in the 95% confidence interval you calculated – they should have known it could be bad. Your boss (who did not seem to appreciate your back-talk) points out that the wait times are even longer than the worst predicted by the confidence interval. You stammer and try to think of an explanation. But it’s too late – the company has already taken a massive hit in revenue, and the boss asks you to clean out your desk…

Thank you for playing this choose your own adventure! If you are upset at being fired, feel free to try again and see if Input Uncertainty could have saved you!

2.

You do some reading and come across “Foundations and methods of stochastic simulation” by Barry Nelson. Flicking through it, you come across “Input Uncertainty”, and you realise you’ve struck gold. The book describes the idea that because the data you’ve used to estimate the inputs to your model is inherently random, this will increase the variability in the outputs, and that you should account for it. But how? The book only gives two suggestions – try and collect more real-world data to reduce input uncertainty, or something called “bootstrapping”

If you go to your boss and ask for more real-world data, go to 3. If you give bootstrapping a go, go to 4.

3.

You go to your boss and ask for more real-world data, explaining your concerns. He tells you (a bit insincerely, in your opinion) that he understands your concerns but time and money are tight, so you’ll have to make do with the data you have.

If you go back and give bootstrapping a go, go to 4.

4.

You start doing bootstrapping. You struggle at first – “resampling? What the hell is that?” you think to yourself. However, the more you try, the more you understand. You start to get the concept – basically, you simply re-draw observations from the data you were given to calculate a new mean each time.

From

Eventually, by doing this enough, you get a sense of the variability among the means – which, you realise with joy, is your input uncertainty! By using this, you re-calculate the confidence intervals (which are much wider now).

If you take these new confidence intervals to your boss, go to 5. If you think you should try something more sophisticated, go to 6.

5.

You go to your boss with your estimates and your confidence intervals. He reads them, and his face falls. “Good work, but this isn’t great news. We pretty much can’t determine anything from this analysis. The company is looking at some dark times ahead”.

Three months, and a number of layoffs later, you realise that maybe there were some more sophisticated methods you could’ve used. However, it’s now too late. The revenues are falling, and the company is looking at more layoffs.

Say goodbye to your bonus.

Congratulations! You didn’t get fired! But that’s about the best you can say about your performance. To see what would’ve happened if you tried something a bit more sophisticated, feel free to try again!

6.

You find a paper giving a very nice review of methods of input uncertainty. It seems that there are a few different methods you can take – and they all have pros and cons. There seem to be three different approaches you could take: bayesian model averaging, meta-model assisted bootstrapping and something called the delta-method.

If you decide to use the Delta-method, go to 7. If you decide to use Meta-Model Assisted Bootstrapping, go to 8. If you decide to use Bayesian Model Averaging, go to 9.

7.

You chose to look into the Delta-method – I dunno, greek letters are cool? – are get to work. You see that the method which uses known mathematical results to decompose output variance into simulation variance and input uncertainty variance. You rapidly decide that this is too mathematical for you, and decide to go back and try one of the other methods.

I didn’t work hard through a maths degree to use maths in real life, goddammit!

If you decide to use Meta-Model Assisted Bootstrapping, go to 8. If you decide to use Bayesian Model Averaging, go to 9.

8.

You decide to do Meta-model Assisted Bootstrapping – it’s got the word “Meta” in it, so you think it sounds cool – and get to work. You realise it involves using the results from a bootstrapped sample to try and model a relationship between the inputs and outputs. This model is then used to determine the input uncertainty. This is easy to do since you’ve only got two parameters, and the simulation is reasonably quick. You complete your work and take your results to your manager. He’s astounded – the results are fantastic and show really well how much variability the company should expect around arrival times. Your recommendations are implemented immediately. It works well, and there are no huge unexpected fluctuations. You are hailed as a hero of the office – not bad for your first year out.

Although the first year has really aged you

Thank you for playing this choose your own adventure! If you want to see what would have happened if you ignored Input Uncertainty, feel free to go back and try again!

9.

You decide to do Bayesian Model Averaging – you’ve heard lots of stats people talk about Bayesian stats, so you think it’s a smart idea – and get to work. Bayesian Model Averaging is similar to bootstrapping, but you weight your bootstrap samples by how likely you think they are, based on your prior knowledge of the sample. That is, when re-taking the sub-samples, make it more likely to select a sub-sample which is more likely given your prior information. However, you don’t really seem to have much prior information to weight your samples on. You talk to you manager about this, and he helps you determine some appropriate priors to use. From this you can create some good confidence intervals for your estimates. Your manager is impressed, and they implement your recommendations immediately. It works well, and there are no huge unexpected fluctuations. You are hailed as a hero of the office – not bad for your first year out.

Although the first year has really aged you

Thank you for playing this choose your own adventure! If you want to see what would have happened if you ignored Input Uncertainty, feel free to go back and try again!

References

Nelson, B. (2013). Foundations and methods of stochastic simulation: a first course. Springer Science & Business Media.

60 second stats – Agent-Based Simulation

Hamish Thorburn — Tue, 25 Feb 2020 20:34:30 +0000

I thought I’d try something different today, and instead of the regular post, I thought I’d try and do a bite-sized summary of a topic. To that end, please enjoy the first installment of 60 second stats! I can’t install a clock in the post, but if you’re really keen, time yourself to see if I’ve done a good job. Today’s topic will be the area of Agent-Based Simulation.

Are you ready?

….GO!

What is Agent-Based Simulation?

Agent-based simulation is any computer simulation in which “agents” interact with each other.

No, not like that.

The agents can be simulated cells in a biological system, or animals in nature, or even people. You then set your simulation running to see how the agents interact.

How do I make an agent-based simulation?

Basically, an agent-based simulation needs 3 elements:

Agents
Relationships/interactions between the agents
An environment the agents exist in

These will often be determined by either the structure of the simulation, or by parameters inputted by the user at the start of the simulation.

Are there any examples of this?

Yes! To pick just a couple of the countless examples, agent based simulation has been used to model:

Predator-Prey relationships between animals ()

And many others.

What do I use an agent-based simulation for?

Often it’s used to either:

Study how the agents interact under certain conditions, because studying these conditions in real-life is difficult/impossible
Predict how real-life agents would act in new situations/environments they haven’t experienced before

That seems simple enough. Are there any problems with it?

The main difficulty is calibration. That is, selecting the right parameters so that the simulation is behaving similarly to the real-life system it is modelling. Otherwise, you can’t trust any of the results you obtain from it.

Which can be heartbreaking when you realise this

How do you calibrate it?

The most common way seems to be trial-and-error – just keep testing new parameters until the output looks realistic. However, there has been a bigger post to start using heuristics to try and automatically calibrate agent-based simulations

That sounds familiar…

It should! It was the topic of

60 seconds is nearly up. Where do I look if I want to know more about this?

give a really good overview of general agent-based simulation. For some examples for the automatic calibration, see or .

Aaaand, that’s time. Hope you enjoyed that! If not, well, you only wasted one minute.

Model-based clustering

Hamish Thorburn — Sun, 23 Feb 2020 10:26:00 +0000

Today’s post is based on a Masterclass given to the STOR-i cohort by Brendan Murphy from University College Dublin.

Clustering

In data science, clustering is the process of grouping objects into groups, or clusters, such that members of the same cluster are more ‘similar’ to each other than they are to members of different clusters.

Example of some data with two clusters before (left) and after clustering (right). It is clear that the members of the red cluster are much more similar (i.e. closer) to each other than they are to members of the blue cluster, and vice versa.

Many common clustering methods (e.g. or ) are based off a metric known as the distance or dissimilarity between the points (an example of this distance is simply the straight Euclidean distance between the points). This is then used in a number of different ways to assign points to clusters – for example, in k-means clustering, each point is assigned to the cluster with the closest mean.

While these methods are very popular, they do suffer from drawbacks. Without assuming a model generating these points, it is hard to claim with certainty that future observations will fall into the same clusters. In addition, some of these algorithms can’t properly deal with many frequently repeated observations.

Model-based clustering

Professor Murphy’s Masterclass instead presented a framework for clustering continuous data known as a Gaussian Mixture Model. This is a form of clustering which assumes that the data comes from a particular probability model.

The model is based on 3 general assumptions:

We know the number of clusters before we start
Each observation in the data as a certain probability of belonging to each cluster.
The observations within each cluster follow a normal distribution (with the appropriate dimension)

These assumptions leave us with two problems to solve when fitting the model:

What are the means and covariances of each of the clusters?
Which cluster does each observation belong to?

It is clear that these two problems are related. The mean and variance of a cluster will depend on which observations are assigned to it, and the cluster an observation should be assigned to will depend on that clusters mean and variance. Fortunately, there is a way to simulataneous solve both problems using the Expectation-Maximisation (EM) Algorithm. The algorithm works by repeated performing an E-step, which assigns each observation to it’s most likely cluster, and an M-step, which then updates the cluster means and variances based on the assigned observations.

An example – Iris dataset

For a simple example, we will look at clustering different subspecies of iris flowers, based on the length and width of their petals and sepals.

Don’t worry, I didn’t know what sepals were either.

This is a famous data set consisting of 150 observations of 3 different subspecies of iris flowers – Setosa, Versicolor and Virginica (see for more information on the dataset, including a link to the data itself). I applied the EM algorithm to the data (Professor Murphy kindly left us some code that does this, but the “mclust” package in R also does this). The gif below nicely shows how the algorithm classifies observations and updates the cluster.

As nice as the above picture looks, the big question is how accurate is the classifier? That is, how well do our clusters match up with the true different subspecies? As shown in the table below, we nailed it on the Setosa. And we did classify all the Versicolor together. Unfortunately, the algorithm couldn’t really distinguish between Versicolor and Virginica. 39 out of 50 Virginica observations were classified as Versicolor. However, it is also worth noting that this dataset is notorious hard to classify, particularly between these two subspecies.

	Cluster 1 (red)	Cluster 2 (blue)	Cluster 3 (green)
Setosa	50	0	0
Versicolor	0	0	50
Virginica	0	11	39

Accuracy of our clusters against the true subspecies. It can be seen that the Setosa flowers were all perfectly clustered, but the majority of the Virginica were mis-classified as Versicolor.

Extensions and further work

We’ve barely scratched the surface of Gaussian Mixture Models. Professor Murphy has done extensive work on the different covariance possible covariance structures between clusters (e.g. if you assume all clusters have equal covariance matrices, or simply different scales, or can be whatever you would like). And that’s not even covering classifying with categorical variables, which can be done using Latent Class Analysis. Even taking Gaussian Mixture Models as presented here, you still need to assume the number clusters. While this can be determined using model selection criteria such as BIC, there is still much work to be done in this area.

Heuristics in Optimisation

Hamish Thorburn — Fri, 07 Feb 2020 08:20:26 +0000

Another fortnight, and another 10 presentations on research in STOR-i. The current phases of the MRes program is starting to introduce us to the different research areas that we may be able to choose our PhD topic in. There was also a fantastic performance by the ��̽̽App Korfball team at BUCS North Regional competition, which you can see for yourself on the .

Not to boast or anything

Korfball brilliance aside, one of the research topics we’ve seen was from Dr Ahmed Kheiri (a lecturer in the ��̽̽App Management School) called “Heuristic methods in hard computational problems”. This is an area I’ve previously had some experience in, so I thought it would be a good idea to dive a bit deeper into this.

An overview of optimisation

Optimisation is the general area of maths of finding the highest or lowest solution to a problem, often given some constraints. A very simple example of this is a furniture maker, who has a certain amount of wood, and needs to decide how many chairs or tables to make to maximise profit, given the limits to the amount of wood each requires, and the amount of time they have to work on the problem. This problem could be formulated as:

Maximise (profit per chair x number of chairs made) + (profit per table x number of tables made)

subject to:

(wood used per chair x number of chairs made) + (wood used per table x number of tables made) ≤ Total wood available

and

(time taken to make a chair x number of chairs made) + (time taken to make a table x number of tables made) ≤ How much time they have

and finally

Number of chairs and number of tables are integers (i.e. you can’t make half a table/chair)

I’ll also use this problem to introduce some terminology:

The objective function is the mathematical expression you are trying to maximise/minimise. In this example, it is the first expression containing the profits of the chairs and tables.
The constraints are the mathematical expressions which you have to abide by to solve the problem. In our example, these are the expressions saying you can’t use more wood or time than you have.
The decision variables (often just called the variables) are the values that you can change to optimise the objective function. This this example, the variables are the number of tables and chairs to make.
If some given values of the variables satisfy the constraints, this is called a feasible solution.
The set of all feasible solutions (i.e. every possible combination of tables and chairs that we could make with our wood and time) is called the solution space.

This is a very simple optimisation problem, and you could probably find solution to it quite quickly (either , or through ). However, these sorts of problems can quickly get out of hand. What if you actually have more products than you can make from your wood? What if the first few chairs sell for a higher cost than the next few? What if you have to sell chairs and tables as a set?
The more complicated the problem is, the (computationally) harder it will be to solve exactly. However, if you’re willing to cheat a bit, there are some methods that can help.

Heuristics

A heuristic is really any optimisation technique/algorithm for finding a good or approximate solution very quickly. It can be something as simple as the following:

Randomly pick any feasible solution to the problem, and take it as the current best solution.
Randomly pick any new feasible solution to the problem.
If the new solution is better than your current best solution, replace your current best solution with the new one
Repeat steps 2 and 3 until you are satisfied that you either have a good enough solution, or you are sick of running the algorithm.

There are two points about the above algorithm. Firstly, it will (the longer you do it) find a better solution; but secondly, there’s no guarantee that it’ll get there any time soon, or any guarantee that it’ll get close to the true best solution. This is because it’s just randomly jumping all over the solution space. It’s not actually looking for a good solution – it’s just stumbling around, hoping to find one.

A better heuristic will often try to move in the “direction” of better solutions. A classic example of this is the gradient descent method:

Pick a point in the solution space. Call this your current best solution
Calculate the gradient of the solution space at your current best solution
Move a step in the solution space in the direction of the gradient. Call this new point your current best solution.
Repeat until the gradient equals (or is suitably close to) 0.

The basic idea is to imagine putting a ball on the surface of the solution space, and letting it roll down, it will eventually settle in the lowest point. A very good gif of this is shown below (taken from the )

Source:

Think of your solution space as the surface of a table. Gradient descent is kind of like putting a ball table and simply letting the ball roll down to find the lowest point.

It’s quite clear from the above gif that this is a much better heuristic. However, there are still situations in where this doesn’t work to well. Imagine we have the following objective function:

If you can imagine placing a ball at the red point, it will roll down and settle at the “minimum” at the blue point. While this is the minimum for this bit of the solution space (we call this a local minimum), it completely misses the true minimum (or global minimum) at the green point.

In general, the more complicated the function, the more sophisticated the heuristics needed to find a good solution.

Optimal wind turbine placement

Moving back to the presentation by Dr Kheiri, we look at the problem of optimal wind turbine placement. To quote the presentation:

The problem involves finding the optimal positions of wind turbines in a 2
dimensional plane, such that the cost of energy is minimised taking into
account several factors such as wind speed, turbines features, wake effects and
existence of obstacles

This problem will have many local minima, and so gradient decent will probably fail if we tried to use it.

The way this was solved in the paper was to use a genetic algorithm. This is a heuristic which tries to mimic natural selection in animal populations. The idea is that you generate a population of solutions, then you pick pairs of solutions from this population and from them “breed” new solutions, which have features from both of their “parent solutions”. The new solutions are then evaluated and the strongest (i.e. the best) survive, and the others are discard. This new solutions are then used to breed more solutions, and the process is repeated until you decide to stop.

Applying this to our wind turbine problem, we first set up our data as a binary vector of 1s and 0s. We do this by diving the plane into a 2-dimensional grid, with at most 1 turbine in each cell. Therefore, each cell is now associated with a binary variable, which takes the value of 1 if there is a turbine in it, and 0 otherwise. The decision then is which cells to but a wind turbine in (i.e. assign to a value of 1) and which to leave. From our matrix of 1s and 0s, we simply organise it into one long vector of turbine locations (mainly for computationaly simplicity), as shown in the gif below (in this example, every cell is a 1, but in truth, they will be a mix of 1s and 0s).

Apologies, my animations skills aren’t exactly pixar level

Once have set this up, we can start our genetic algorithm. First, we then generate a bunch of solutions, pair them off, and start breeding new solutions from these pairs. The breeding has two aspects:

Crossover: This is taking some parts of both the mother and father solution, and incorporating these into the child solution
Mutation: Randomly changing some aspect of the child solution. This helps the algorithm avoid getting stuck in a local minimum.

The breeding is shown below (again, in this example, the mother and father are the same, but in reality, they will be different).

Queue some Marvin Gaye

We then see how good all these child solutions are, and keep the best ones. From these, we breed more new solutions, keep the best, and so on. We keep doing this until we are satisfied we have a good solution.

Hyper-heuristics

By this point, you probably are starting to realise the many different types of heuristic you could use to solve an optimisation problem. However, this leads to an obvious question – how do you chose which one? Some will converge quite quickly (e.g. gradient descent) whereas some will be more robust against local minima (genetic algorithms). In general there is no such right answer.

However, for a given problem, it is possible to apply a number of different heuristics, and then use a hyper-heuristic (also called a meta-heuristic). The way it works is to start with a number of low-level heuristics. You then run them over the problem, and (when you would normally iterate you one chosen heuristic) you switch between heuristics. This could be as simple as changing between the breeding rules in genetic algorithms, or switch search methods altogether. Ideally, you would also keep track of how each heuristic is doing as you go, and use this information to help chose your next heuristic.

In summary

Heuristics are methods to help you quickly find a “good” solution to an optimisation problem (but not necessarily the best “solution”)
They can range from pretty much useless (random search), to quick but error-prone (gradient descent) to slow and robust (genetic algorithm)
There’s no “one best” heuristic. Each hang strengths and weaknesses which suit different problems.
Moving forward, one of the big developments will be the use of hyper-heuristics, to determine which is the best method to use for your problem.

Factor analysis and Interpretability

Hamish Thorburn — Fri, 24 Jan 2020 13:38:20 +0000

So, we had the STOR-i conference last week. 2 days of interesting talks, free wine, and somewhat awkward photos.

Our 2020 edition of the STOR-i conference commences today! We are glad to have talks with wonderful academic and industry professionals alongside STOR-i alumni and PhD students!
— STOR-i (@STORiCDT)

Exhibit A

Of the presentations, I was intrigued by one presented by Dolores Romero Morales, of the Copenhagen Business School (her personal website is ) mentioning a paper entitled “Enhancing Interpretability in Factor Analysis by Means of Mathematical Optimization”. Sadly, she ran out of time to go into this during the talk, so I decided to do a post discussing this paper.

But firstly, a background in factor analysis.

Factor analysis is a statistical method to try and reduce the number of important variables in a linear regression model. In a standard linear regression model, you have a number of variables, which you can see/observe, and you assume have a direct effect on the output.

Basic idea behind linear regression

In Factor analysis, there are a number of latent variables called ‘Factors’. These are unseen variables, which affect the seen variables you would use in your linear regression. The factors affect this variables in the manner of:

Variable = Effect of Factor 1 + Effect of Factor 2 + …+ Some Random Error

You then use the variables to determine their effect on the outcome.

The basic idea behind factor analysis.

An example is a study in which researchers are trying to model the income of participants. One can imagine that some unobservable qualities – such as intelligence and work ethic – would influence how much a person will earn. However, we can’t measure these – we can only measure things such as university results and years of work experience. The idea is that the (unbserved) factors of intelligence and work ethic will influence the (observed) variables of university results and years of work experience, for example:

University results = Effect of Intelligence + Effect of Work Ethic + Random error

Then, you use the university results and work experience to model income, such as:

Income = Effect of University Results + Effect of Work Experience + Some more random error

The structure of the variables

Factor analysis can be used to:

Examine if a hypothesised structure exists in the data – our toy example is an example of this
Examine the data to see if there is any structure present.

The latter example is known as exploratory factor analysis. The paper I’m going to look at today looks at an extension of this.

Enhancing Interpretability in Factor Analysis by Means of Mathematical Optimization

I’m not going to lie, I was very excited when I heard the title of this paper. Communication in mathematics and statistics is always a problem. Data Science, just through the sheer number of observations and variables is particularly susceptible to this.

The paper is trying to solve the following problem. Let’s say in our example above, that you didn’t have any idea what the two factors were. So you plug your data into your software package of choice to do some factor analysis

Hint: You should choose R

The issue is, the model R your software package fits isn’t guaranteed in any way to fit factors that make intuitive sense. For example, in our example above, you could get a model which has 13 factors, of which 9 affect university results and 7 affect work experience, with no real pattern between them.

While there are many different ways of dealing with the interpretability of factors, the (extremely condensed) idea of the paper is that you can assign the different explanatory variables to clusters. Then, you can force the model to fit factors that match these clusters. Therefore, you can be sure that you will end up with factors that “make sense”.

Example – California Reservoir Levels

The authors in the paper trial their method on a dataset of California reservoir levels (from Taeb et al. 2017). They assign one variable to each cluster, chosing:

Palmer Drought Severity Index
Colorado River Discharge
State-Wide Consumer Price Index
Hydroelectric Power
State-Wide Number of Agricultural Workers
Sierra Nevada Snow Pack
Temperature

as the seven variables, with one variable in each cluster. Therefore, when fitting the factors, we can tell that they will be related to one of these variables.

In fact, they find that the best fitting comes from having 2 factors – one for Hydroelectric power and one for temperature.

Issues with the paper

I’m torn here. On the whole, while I’m sympathetic to any attempt to make data science more accessible, I’m a bit confused by this paper (ironic, really). They assign each factor to a specific variable to make the model easier to interpret. But once this is done, it isn’t clear why this isn’t simply a linear regression. The whole idea is that factor analysis tries to find some hidden factors that govern the way the explanatory variables work. But for this approach, the factors are assigned to seen explanatory variables anyway, so it’s not clear what this process achieves. Furthermore, the code used in the paper isn’t available online, so it is hard to replicate what they’ve done and work out what they did for yourself.

However, I would be very interested to see how fitting models with these interpretability clusters to fitting standard exploratory factor analysis models. Having previously worked trying to explain complex data analysis to non-technical people, the idea that you could explain that your explanatory variables are governed by a simple, understandable process is golden. However, until I give this more of a go myself, I’m not sure how much this will add to the process.

What is this?

Hamish Thorburn — Sun, 12 Jan 2020 17:56:23 +0000

As part of our MRes (Masters of Research) year within STOR-i, we’ve been encouraged to start writing a blog. And so, here we are.

The aim at this stage is to have it updated every fortnight(ish) and to just cover just general topics/ideas/papers that I come across as part of my study. This may evolve as I go along, but that’s what to expect for now.

The first real post should come soon. I’m planning to cover some of the more interesting presentations from the 2020 STOR-i conference which I attended.