Matthew Darlington

Second term at STOR-i

Matthew Darlington — Fri, 03 Apr 2020 12:10:00 +0000

This will be my last blog post to wrap up the first two terms of my mRes year at STOR-i. My second term at Lancaster has been an enjoyable experience where I have been able to build upon my work last term. I hope to give a brief overview to some of the things I have been up to over the last three months.

The main piece of work this term was working on two research topics. To give us inspiration we had 12 sessions on different topics, where three experts would come and give us a lecture on a specific area of their field. These included “Logistics, Transportation and Operations”, “Structured Data” and “Spectral Methods for Time Series and Spatial Data” among many others. We were then asked to pick two to look into further. The first task was to produce a seven page report on a topic along with an overview that would be suitable for a non-academic audience. For this I chose “Stochastic Dynamic Optimisation” with Rob Shone and looked into how we can use dynamic programming to help make decisions in queuing models. The second task was to produce a longer report, of between 15-20 pages, as well as a presentation that would be given to the STOR-i cohort. My choice for this was “Computational Statistics” with Chris Sherlock where I looked into non-reversible Markov chain Monte Carlo.

Another key part of the term were the masterclasses given by professors from other universities. I won’t go too much into the content here since I have written a separate blog post for each, but the provided a welcome break from our research topics and gave a look at new ideas we had not previously seen at Lancaster.

We also had problem solving days, where companies would come and present a problem they had encountered in industry and wanted to see how a group of students would tackle it. The four days we had we put on by Elsevier looking at improving the recommender system on their Mendeley product, BBC working out how to best cluster their user base, Tesco who wanted to know how to price their petrol compared to their competitors and Northwestern electricity trying to look for anomalies in their asset database. All were an interesting look into how the skills developed in a PhD program can be applied directly to real world problems that companies are facing.

Finally, my last task was writing this blog throughout the term, which I hope you have found interesting. If you have any questions please feel free to contact me through the contact form above.

STOR-i Masterclass: Professor Peter Frazier

Matthew Darlington — Fri, 20 Mar 2020 11:58:00 +0000

This week we had the last masterclass of the year given by Professor Peter Frazier from Cornell University. Due to the current uncertainty caused by the coronavirus outbreak around the world, unfortunately Peter was not able to visit Lancaster in person. However, making the most of the situation we instead were able to still interact over the internet and it was an interesting half a week.

Peter’s area of expertise is in operations research and machine learning and gave us an introduction to Bayesian optimisation, specifically how we could implement it ourselves using the programming language Python.

The problem motivating Bayesian optimisation is as follows: suppose we have a function which we wish to find a maximum of, but for the purpose of this explanation we do not know what the function is. This is commonly thought of as having a “black box” where we pass in some inputs and get a output without getting to see what happens inside. Thus we can not simply differentiate the function as we might be used to. Furthermore, when estimating our function we may be getting noisy estimates, a common idea is to assume we are adding a $ \mathcal{N} (0,1) $ sample to each evaluation.

Credit: Ramraj Chandradevan

The way Bayesian optimisation tackles this problem is to estimate the function using a Gaussian process. Essentially we consider having a function for the mean of the function, and another for the variance. At each time step we can make one more evaluation of the function, and with each extra data point our Gaussian process becomes a better and better approximation of the “hidden” function.

The complicated part is choosing where we want to choose for our next evaluation. There are two concepts which need to be balanced: exploration and exploitation. Exploration relates to wanting to check over the whole range of the function, not leaving an area undiscovered. Exploitation is if we can see one area is looking better than others, we want to focus looking there to find the global maximum. The methods we learnt to deal with this is by using different functions that take both factors into account, called acquisition functions. We then simply choose the maximum of this for the next location to evaluate.

If you would like to learn more about the topic, Peter has an excellent set of resources on his .

STOR-i Masterclass: Professor Laura Albert

Matthew Darlington — Fri, 06 Mar 2020 09:52:00 +0000

This weeks masterclass was given by Professor Laura Albert from University of Wisconsin-Madison. She specialises in public sector operations research and gave us an introduction into some of the work she has done. Public sector operational research looks at how we can use OR to help make better decisions for services such as ambulances, fire services and how to tackle disasters.

On Monday we were given an introduction to the history of OR in the public sector and Laura gave us an insight into some of her own work. This involved learning about how OR was used in cities in America such as US during the 20th century in order to increase their efficiency under limited government funding. There was an interesting balance between the answers that were coming from the models of the time, and what the departments were willing to accept. An example of this was how staffing duties were found to be optimal by putting more staff at busier times, however, unions representing employees were often against this since it would inconvenience their daily schedules. Also since this was a new area there was scepticism as to how beneficial following the answers outputted would actually be, as opposed to experts who had worked in the industry for years.

Tuesday morning we were looking at how we can use integer programming to help us place ambulance stations on a map in order to give the best coverage to an area. This built up nicely on some of the work we had done last term with solving different kinds of optimisation problems. The aim was to try and build robustness into how we choose which station an ambulance is sent from in the case of an emergency. This is a simple problem if we are assuming all ambulances are available, but in reality we may have to send our second or even third choice. A problem particularly in America is working out how to provide acceptable service to rural areas whilst still having sufficient capacity in the city centres.

In the afternoon we took a different topic of how we can use OR to help in disasters like hurricanes or terrorism. In the aftermath it is important to work out how we can give aid as quickly as possible to those affected. Problems can include how we can divert resources when roads are blocked, or how we can rebuild the electricity network in order to reconnect key infrastructure in the optimal order.

In all it was an enjoyable two days and I feel like I learnt a lot about a new area that I had not really considered in the past. If you would like to read more about Laura she has her own blogs and

STOR-i Masterclass: Professor Brendan Murphy

Matthew Darlington — Fri, 21 Feb 2020 10:09:00 +0000

Last week we had the first of this years STOR-i masterclasses given by Professor Brendan Murphy from University College Dublin. He introduced us to model-based clustering and classification. I hope to give a brief insight into his interesting talks over the two days.

The goal of clustering analysis is to place objects into groups that give some meaningful analysis. The idea is to get groups whose members share something in common, and different from members of other clusters.

Clustering as a concept has been around for millennia. Plato was the first to formalize the thinking with his ‘Theory of Forms” and Aristotle classified animals into groups based on their characteristics in his “History of Animals”.

Much later on Linnaeus began to cluster plants into hierarchical groups in his works “Species Plantarum” and “Systema Naturae”. He used features such as if the plants had flowers, and the number of stamen to divide them up into 24 different classes.

Brendan then explained how more recent clustering algorithms could be coded up on a computer to distinguish between different vectors of numbers. The masterclass was finished off with a live demonstration of how we could cluster runners in the 24 Hour World Championship of running, which led into thinking about some open questions in the area of clustering and classification.

Putting the methods into practice

I took the methods learnt in the masterclass and tried to apply them myself to cluster the pixels found in an image. If we think of a picture as the pixels with their red, green and blue co-ordinates plotted in a three dimensional grid, then we can cluster these into k groups and use this to compress the size of the image. Working in python I used this to cluster a picture of some of the mRes cohort.

The original image
Working with 5 colors
Working with 7 colors
Working with 10 colors

If you would like to find more out about clustering I would recommend looking at some of Brendan’s work in the field including the book:

Bouveyron, Charles & Celeux, Gilles & Murphy, Thomas & Raftery, Adrian. (2019). Model-Based Clustering and Classification for Data Science: With Applications in R. 10.1017/9781108644181.

Stochastic Dynamic Optimization

Matthew Darlington — Fri, 14 Feb 2020 09:46:00 +0000

In the second term of the MRes we have been tasked to write two research topics based on talks we were given just after Christmas. I chose to write my first shorter report on stochastic dynamic optimization. I hope to explain a little about what I looked into, and what my report was about…

Queuing is something that all of us experience on a daily basis. It can be frustrating when we are faced with long queues that are not worth the wait and we may question why we bothered queuing up in the first place. This was first considered in the literature during the 1960s when thinking about how we could reduce queues in all walks of life such as at a cab stand or when hiring industrial equipment.

The method in which I will consider this is when we either have the option to assign an arrival into a queue, or to send them away when they first arrive into the system. This deviates slightly from traditional queuing theory where we consider how we can serve every single individual and this is an important distinction. There are two contrasting ways in which this can be thought of: selfish and social policies. With a selfish policy the individual is only concerned with maximizing their own expected profit, and does not take into account anybody else. This is in contrast to a social policy where there is some authority organizing the queue in order to maximize the expected profit of the entire population as a whole.

It is a relatively simple concept, but in fact it can spring surprising results. For example, what may not be instinctive is that in fact to increase the reward of the whole, we must shorten the length the queue can build up before turning new arrivals away. Thus the point of how one would go about implementing these in practice is not a simple task. One of the ideas first proposed in the literature is finding a toll that must be paid in order to join this queue. This has already been implemented in some systems such as for example when booking a taxi we have surge prices when demand is high, and cheaper prices when demand is low. Doing this helps to even out the arrival of customers in order to maximize the profit by the operator. For simple queues we can in fact mathematically calculate how we would go about doing this in order to push an individuals selfish policy to match up with the social policy for the system. However, this can lead into ethical complications when we are looking systems that represent a basic human need such as the distribution of aid or other essential supplies that are of immediate need for vulnerable people. This can lead into many different considerations and as such I did not go into this in my report but it is an interesting point and important to think about.

I instead was focused on the construction of the social and selfish policies for a general model of queuing. The methodology of this quickly becomes complicated due to the large number of scenarios we can find ourselves in. Thus I explained how we can come up with this directly for the most basic form of queue with one server at one facility and we can either choose to allow an arrival to join the back of the queue, or force them to leave with nothing (called balking). I then referred to methods that others have introduced in order to be able to solve more complicated and general methods with the help of computers. However, these quickly become infeasible when dealing with anything other than small toy problems and thus we must look at heuristics that can approximate the solution rather than solving it exactly.

Random Graphs

Matthew Darlington — Fri, 31 Jan 2020 13:49:00 +0000

A graph is a pair $ G = (V, E) $ where $V$ is a set of whose elements represent the verticies of the graph and $E$ is a set of pairs of vertices that represent the edges of the graph. This is better explained with an example:

An example of a graph on 4 verticies

Here we have $V = \{1,2,3,4\} $ and $ E = \{ \{1,3\} , \{1,4\}, \{2,3\} , \{2,4\} , \{3,4\} \} $

With this basic framework we can generate a random graph, the simplest of which being the Erdős–Rényi model. The method is simple, first decide on the number of vertices $ |V|= n $ and a value $ p \in [0,1] $. Now for every possible edge we include it with probability $p$ independently of every other edge.

Thus the probability of a single edge existing is a $Bernoulli(p)$ random variable. There are $ {n \choose 2} $ possible edges and so $ \mathbf{E} [ |E| ] = {n \choose 2} p $.

Another interesting property we can look at is the expected number of triangles found in the graph. A triangle exists if, like you would expect, there are 3 vertices all connected with edges. Calculating this is simpler than it might first sound…

There are $ t = {n \choose 3} $ possible triangles in our graph of n vertices. Now define random variables $ X_1, \dots, X_t $ such that $ X_j = 1 $ if triangle j exists and $0$ otherwise. Note due to how we are constructing our graph these random variables are independent and identically distributed so we do not need to worry about how we are assigning a random variable to a triangle. Then we have:

$$ \begin{align} \mathbf{E} [no. triangles] &= \mathbf{E} \left[ \sum_{j=1}^t X_j \right] &&\text{(by definition)} \\ &= \sum_{j=1}^t \mathbf{E} [X_j] &&\text{(by linearity of expectation)} \\ &= t \mathbf{E} [X_j] &&\text{(iid)} \\ &= tp^3 \end{align} $$

This seems a very simple result so to verify it I ran a simulation in R to check it out.

You can use my code to see for yourself if you would like:

rm(list = ls())
require(igraph)
set.seed(1)

#### Model Parameters
n.vec = 1:50
p = 0.5
reps = 10

#### Function to create a erdos-reyni graph
erdos <- function(N, p, plots){

  E = choose(N, 2)
  edges = rbinom(E, 1, prob = p)
  
  A = matrix(0, nrow = N, ncol = N)
  A[lower.tri(A, diag=FALSE)] = edges
  A = t(A) + A
  
  graph = graph_from_adjacency_matrix(A, mode = "undirected")
  
  if (plots == TRUE){
    plot(graph)
  }
  
  return(list(graph = graph, triangles = length(cliques(graph, min=3, max=3))))
}

#### Simulation
clique.vec = rep(0,length(n.vec))
for (i in 1:length(n.vec)){
  for (j in 1:reps){
    clique.vec[i] = clique.vec[i] + erdos(n.vec[i], p, FALSE)$triangles
  }
  clique.vec[i] = clique.vec[i] / reps
}

plot(n.vec, clique.vec - choose(n.vec,3)*p^3, lwd = 1,  type = "l", xlab = "n", ylab = "Abs Diff")

MCMCMC

Matthew Darlington — Fri, 17 Jan 2020 15:00:00 +0000

Over Christmas I was set the task of writing a report about something that interested my during my first term of the MRes. I chose to look into something called Metropolis-Coupled Markov Chain Monte Carlo (MCMCMC). This sounds like a mouthful but once broken down isn’t too scary.

I will start off by giving a brief explanation of what Markov Chain Monte Carlo (MCMC) is. Imagine you have a probability density function $ \pi(x) $ and you need to take random samples from it, but your function is too complicated to work this out by hand. What MCMC does is find a way to simulate taking random draws by constructing a random walk that moves by taking into account this complicated function. The simplest way to do this is called Random Walk Metropolis (RWM).

We start off with some initial position $ x $. Then we propose a new position by adding a normal sample to it $ y = x + \mathcal{N}(0, h) $ where $h$ is something the user can tune. There is then a Metropolis-Hastings acceptance step where we try to determine if $y$ is a sensible sample for our distribution to take. To do this we let $ A = \min \left( 1, \frac{\pi(y)}{\pi(x)} \right) $ and then accept $ y $ as our new position with probability $ A $. This is then repeated to get as many samples as you wish.

Of course the downside of such a simple method is that it will only work in certain cases. Having a multi-modal distribution is something that makes RWM perform very badly. What I mean by perform badly is that we get very dependent samples which is not desirable if we would like to use them for any kind of analysis about the distribution.

This motivates the need for more complicated methods, and one that works well in this scenario is MCMCMC. What we now do is we take multiple probability density functions $ \pi_1(x), \pi_2(x), \dots, \pi_m(x) $ and run RWM on each of these separately. We pick $ \pi_1(x) = \pi(x) $ and then for each other distribution we would like to ‘smooth out’ the function.

Examples of functions we could take if the top left was our target

The clever part though is that after each iteration we introduce a Metropolis-Hastings step that can switch the position between any two of the chains. This allows us to get better movement with our random walk on $ \pi(x) $ as I will show from the results I obtained:

Results with MCMCMC on the left and RWM on the right

The top graphs show a histogram of the simulated results with the true distribution in red over the top. From this we can see that we’re getting a much better representation with MCMCMC. The second graphs show what is called an ACF plot and is used to measure the correlation of the data with a lagged version of the data. We wish this to converge to 0 since then we will be getting what appear to be independent results, and MCMCMC is getting much closer than RWM. The last graphs show the random walk for the first 300 steps. What we see is that MCMCMC is moving between the two peaks whereas RWM gets stuck in the the right peak.

If you would like to read more about this you can read my Christmas report here:

Report

And if you would like to discuss MCMCMC or any other blog on my website please feel free to contact myself using the form below.

STOR-i Conference 2020

Matthew Darlington — Fri, 03 Jan 2020 14:07:00 +0000

Last week I got to start off the new year at Lancaster by attending the annual STOR-i conference. I was able to sit through a day and a half of talks from academics coming from around the UK as well as Europe and America. My first blog will be about my favourite talk from this year’s conference, ‘Multi-armed bandit problems with history dependent rewards’ by Ciara Pike-Burke a STOR-i Alumni.

What is a Multi-armed bandit?

Consider you have a row of $n$ slot machines in front of you each with a fixed rewards structure. You are given £$H$ to play the machines and each cost £$1$ to play, with your objective to be take home as large winnings as possible! How to tackle this problem is not as simple as it might first look and there have been many different approaches to come up with a strategy to optimize, i.e. the decision process.

What is the point?

This may sound like a rather abstract problem but in fact there are important applications we can use bandits for, and one of the most widely used is in advertising. Say now instead of being sat at a row of slot machines you are an internet advertiser considering which advert to display to a customer. The rewards in this case could be how many customers click, or how much they go and spend on the advertised link.

What are history dependent rewards?

This approach to internet advertising does not apply to all kinds of products that someone might wish to advertise. The example given was sofas, I may go and buy a sofa off an advert and thus make it appear I am interested in sofas. But now the traditional policies for bandits would make it more likely that I see the same advert next time I am displayed one. The point is now I have a sofa and do not want to see this advert in the immediate future. Hence, this gives the importance of having history dependent rewards.

This is an example of a periodic reward as the desire for a sofa increases and decreases over time. There are other kinds of history dependent rewards, the simplest being strictly increasing or decreasing rewards for example representing loyalty to a company. A more complex reward structure could be coupon rewards. Think of when you go to your favorite café and get stamps on your loyalty card and after so many visits you get a free coffee. This is essentially the set up but now we do not know the number of stamps we need to acquire or the prize we get at the end.

How can we solve this?

We can try to predict what the rewards will look like from all the arms in some small future time interval and use these to try to plan our next best move. For example, if a customer has just clicked on an advert for a sofa, I might want to wait a while until the same advert is shown and instead give adverts for coffee tables instead. In the presentation we were told about how you could use something called a Gaussian process to predict the reward function. Given our prediction of these we can then optimize our next d steps in order to achieve maximum profit. Obviously if we could look at every decision we would ever make then this would be optimal, but in practice a computer cannot do this thus we have to decide on a small fixed time period to consider.

References:

Ciara Pike-Burke, Steffen Grünewälder (2019) Recovering Bandits

First term at STOR-i

Matthew Darlington — Fri, 20 Dec 2019 12:10:00 +0000

In September 2019 I began a new chapter of my life starting at STOR-i in Lancaster. Over the summer I had worked as an intern in the department and was given the opportunity to become part of the new mRes cohort.

The term started off with a couple days away up in the lake district with members of staff and the first year PhD students. We were put into different groups throughout the days, doing team building tasks such as canoeing and creating t-shirts.

The term itself was split into two distinct halves: lectures and topic sprints. We had four modules each giving an introduction to a different aspect of statistics and operational research. These were entitled Inference and Modelling, Stochastic Simulation, Deterministic Optimisation and Probability and Stochastic Processes.

In the second half we were introduced to a new module for STOR-i – contemporary topic sprints. Every week we were placed into groups of four, and attended a lecture on Monday morning given by an expert in a field. They would then introduce a new topic to us and give a list of references for further reading. The remainder of the week would then be spent in our groups producing presentations to give to the expert along with an examiner from STOR-i to show what we had achieved.

To finish off the term we went for a Christmas meal to Quite Simply French, a great small restaurant in Lancaster. In all I had a great time in my first term as a STOR-i mRes and am looking forward to the next year.