Operational Research – Rebecca Hamm

This Week on the STOR-i Programme: Bayesian Optimization

Rebecca Hamm — Sun, 18 Apr 2021 09:38:25 +0000

At STOR-i we are currently looking through our potential PhD projects. For each project we have been given a page summarizing the topic with some papers listed at the bottom. As I was looking through these papers I came across one which particularly interested me in an area I knew very little about. I decided a good way for me to form a deep understanding of this paper would be to write a blog post on it. This way you can learn something too.

The paper in question is called . So you may remember in a previous blog post I explored a Bayesian approach to a multi armed bandit problem and in another I looked at a heuristic approach to an optimization problem. Well today (or which ever day you decide to read this) we are looking at a Bayesian approach to an optimization problem.

So lets outline the situation. We have a function f(x). We would like to find the maximum of this function, however we do not know the structure of the function, and it is expensive to evaluate the function at certain points. Basically we have a black box which we give inputs (our x values) and receive an output (f(x) value). Since evaluating points of the function is expensive we can only look at a limited number of points in the function. So we have to decide which points to evaluate in order to find the maximum.

So how do we do this? Firstly we fit a Bayesian model to our function. We can then use this to formulate something called an acquisition function. The highest value of the acquisition function is the point we evaluate next.

is used to fit the Bayesian model. We suppose that f values of some x points are drawn at random from some prior probability distribution. This prior is taken by the as a multivariate normal with a particular mean and covariance matrix. A mean vector is constructed using a mean function at each x. The covariance is constructed using a kernel which is formulated so two x’s close together have a large positive correlation. This is due to the belief closer x’s will have similar function values. The posterior mean is then an average of the prior mean and an estimation made by the data with a weight dependent on the kernel. The posterior variance is just the prior covariance of that point minus a term that corresponds to the variance removed by observed values. The posterior distribution is a multivariate normal.

Taken from the paper

Illustrated above we have an estimate of the function f(x) (solid line) and the dashed lines show Bayesian credible intervals (these are similar to confidence intervals) with observed points in blue. Using this we can form an acquisition function such as:

Taken from the paper

This function tells us which point of the equation to evaluate next. This is the maximum point of the function. There will be a balance between choosing points where we believe the global optimum and places with large amounts of variance.

There are many different types of acquisition functions. The most commonly used one is known as the expected improvement. In this case we assume we can only provide one solution as the maximum of f(x). So if we had no more evaluations left we would provide the largest point we have evaluated. However, if we did have just one more evaluation, our solution would remain the same if the new evaluation was no larger than the largest so far, but if the new evaluation is larger that would now be our solution. The improvement of the solution is the new evaluation minus the previous maximum. If this value is positive and zero otherwise. So when we choose our next evaluation we would like to choose one which maximizes improvement. It is not quite that simple as by the nature of the problem we do not know the value of an evaluation until we have chosen and evaluated that point. This is when we use a Bayesian model as we can use it to obtain the expected improvement of the points and then choose the point that has the largest expected improvement. This will choose points with high standard deviation and points with large means in the posterior distribution. This means there will be a balance between choosing points with high promise and points with large amounts of uncertainty.

This method can be applied to many problems including robotics, experimental particle physics and material design. This paper explains the application of Bayesian optimization to the development of pharmaceutical products. To learn more I’d advice reading the I used for this blog as well as this which discusses constrained Bayesian optimization and its applications.

This Week on the STOR-i Programme: Memes

Rebecca Hamm — Sun, 04 Apr 2021 10:26:56 +0000

Before you get excited, it’s not the type of memes you are thinking of. I do apologize, however I struggled to find a link between online memes and Statistics and Operational Research except for maybe this one:

If you are disappointed, my friend Robyn includes a tweet of the week at the end of each blog post which I’m sure there are plenty of memes there. What we are actually going to talk about is the concept of memes in heuristics used for optimization. This week, as part of the MRes course, I have been writing a report on that very topic. I thought it was interesting so I decided to share it with you.

Firstly, what even is optimization? Well a previous MRes now PhD student at STOR-i covered this in a similar blog on heuristics in optimization. But if you are too lazy to click that link then basically we wish to minimize or maximize a function subject to some constraints. A very famous example of this is the. In this problem, as you may have guessed, we have a travelling salesmen. This salesmen wishes to visit all of the cities only once. When he does this he wishes to incur the minimum cost or distance. So the problem may look at bit like this:

There are many exact methods that will produce the correct solution. However, as you increase the problem (the number of cities) the time to compute these methods increases greatly to a point where there is no way you have time to wait for a response. So in response to this, heuristic methods are used. One popular heuristic method is the genetic algorithm. As you can probably tell by the name it is based on genetics, particularly how they are past on.

The algorithm starts with a set of individuals each with a genetic code (this is the order they visit each city). Some of the individuals are then selected to be parents. Usually, the best individuals are selected (individuals with least cost/distance) but occasionally there is some randomization. The parents then “reproduce” which is where there is a crossover between two parents. This can happen in different ways for different problems, in particular there are multiple ways to do this for the travelling salesmen. Today we will discuss one called the Very Greedy Crossover found . In this crossover an initial city is chosen randomly then the next city is chosen from the adjacent cities in the parents. The closest cities to the initial cites out of the adjacent ones in the parents is chosen. This process is repeated until all cities have been chosen. For example:

These resulting “children” are then mutated with some probability. Again, there are many ways to do this. One way is to simply swap the places of two cities. The individuals now make up a new population and the process is repeated until some condition is met. This may be that a solution is found that is considered good enough or simply after so many iterations the best is chosen.

I understand that by now you have either completely forgotten this blog post was supposed to be about memes or you are understandably frustrated with me for not getting to the point. But don’t worry I’m doing it now. Dawkins coined the term to mean an analogous to the gene in the context of cultural evolution. So, ideas and behaviors that spread within our culture or image based jokes that spread through the internet. This has been adapted so we can include memes within the genetic algorithm to create what are known as memetic algorithms.

“How do we do this?” you may be asking. I hope you are cause I’m about to answer that very question. It’s simple. After the mutation stage we add what is called a . These are algorithms that go from solution to solution hoping to find an optimal solution. An example is Tabu search. Tabu search includes memory about where it is previously been to avoid cycling through the same solutions and getting trapped in local optima. Local optima means that it’s the best solution in that area. This does not mean its the best solution overall. Searches can get trapped in local optima as it appears to be the best solution if other areas are not searched. For example, the neighborhood of a solution may be all the solutions that can be found by removing a city and placing it in a different position. The search will then move to the best solution in the neighborhood. Since each time the best is chosen a local optima will be found, but other solutions that require choosing a worst option first to then move to a better option may be missed. The Tabu remembers where it’s been and blocks previous moves forcing the search to look at other solutions.

Memetic Heuristics are seen to be more effective and produce better results, however this comes at a price. As these algorithms are more complicated the computational time is longer. Hence, if you are looking for a quick answer I suggest you stick with a more simple heuristic. However if you have the time you should definitely give a memetic algorithm a go. The particular one I have outlined in this blog can be found . Another approach for using a memetic algorithm to solve the travelling salesmen using particle swarm optimization can be found . Find out more about the genetic algorithm , tabu search and is a good paper detailing many nature based heuristics. If you want to learn more about memetic algorithms read this .

This Week on the STOR-i Programme: Fast Fashion and Multi-armed Bandits

Rebecca Hamm — Sun, 21 Feb 2021 19:37:36 +0000

This week as part of the MRes course we had to pick our next topic to write a report on. I was really stuck between two options but in the end had to chose one. I thought if I’m not going to write a report on the other option I can at least write a blog on it and here we are. So as you may or may not have guessed from the title of this post the option I didn’t chose was: Multi-armed bandits. At the end of the talk on this area the lecturer, Kevin Glazebrook, mentioned some areas of particular study. One that particularly caught my eye was fast fashion. Before deciding to do a Maths degree I wanted to be a fashion designer or just any job in fashion really. While I gave up that dream for my love of Maths, it is still an interest of mine. Hence, I was very excited by the combination of the two areas.

Previously clothing companies would have to make decisions on what products that were selling that season with very little information on where demand may lie that season. As you can imagine this leads to them missing opportunities of selling popular goods as well as having excess supply of unwanted products. As technology has improved, especially manufacturing schemes and means of transport, companies have been able to delay some of the production for that season. This means they will have more information on what’s in demand that season and then produce and sell goods accordingly.

Now you may be thinking that’s very nice but what’s that got to do with maths. Well I’ll tell you. I assume you remember one of the focuses of this blog was . If you are picturing something like this picture from this, do not worry as in this case we are talking about problems known as multi-armed bandits. In these problems we have a series of time steps and at each time step we have to make a decision (pull an arm). Before we pull an arm we are unsure if it will help us achieve what ever it is we wish to achieve, but by pulling that arm we will know more information about the arm. The aim is to minimize the regret we have for pulling an arm. To do this we have to balance between two things: exploitation and exploration. So we want to exploit any information we have from pulling arms previously in order to pull arms which give us successful results. However, we want to explore all our options to ensure we have found the best arm.

So how does this relate to fast fashion? Well if the company delays production so that they release a new selection of goods at T time steps. Then at each time step, t, they have to chose which products they will release in this selection. Picking a product to go in a selection is pulling an arm. By picking a product they can then see how well it sells and hence, its demand. They can then use this information to help make their decision at the next time step. To ensure that we are making the best decisions with the information we have at each time step we make a model.

So lets look at this model. To start with we have a set of S different products to chose from. As there is limited space within a shop we can only chose N of these products at each t. For this model we assume that a customer will buy one unit of a product at an unknown constant rate d_s. This is assumed to remain constant but the actual demand for the product will only be observed at times when the product is in the selection. To formulate this model we use some Bayesian statistics. If you are not clued up on Bayesian statistics I suggest you take a peak . In Bayesian statistics we can incorporate prior beliefs or information on parameter of our model. In this case our prior beliefs are represented as a with a shape parameter m_sand scale parameter a_s.Both are assumed to be positive and m_sassumed to be an integer. We are using a for a on any samples of data we have at a given time. As the Gamma distribution is a , our resulting distribution (posterior) is a Gamma distribution with shape parameter (m_s+n_s) and scale parameter (a_s+1), where n_sis the number of products, s, sold in a selection period. So, each time a product is selected its posterior distribution will be updated with the addition of n_ssales for that selection period to the shape parameter and and 1 to its scale parameter. The intuition is that the shape parameter is the units of products that will be sold in a number of periods equal to the scale parameter so the expected number of sales from a product in a period is the shape parameter divided by the scale parameter. This can be used to make decisions by choosing options with the largest expected sales. This model balances exploration and exploitation. If a product has a lot of sales n_swill be larger so will the shape parameter and hence the expectation will be as well. This means that product will likely be picked again however, the more times a product is picked the larger the scale parameter will get. A larger scale parameter means the expectation will be reduced hence, lowering the chances options picked frequently and increasing opportunities for exploring other options.

If we simplify the problem so that we have to chose one of a pair of shorts, a skirt or a skort at each time step as choices may go something like this:

The starting scale parameters are 16,17 and 13 for shorts, skirt and skort respectively and the shape parameter is 1 for all .

To read the paper that formulated this model as well as learn more about how maths is used to learn about demand within fast fashion click . I hope this blog post gave you a little insight in to maths being used in our everyday lives which you may not have previously thought about.