Statistics – Edward Mellor

Introduction to Extreme Value Theory

Ed Mellor — Fri, 17 Apr 2020 13:06:46 +0000

In my last post I promised an overview of my two research topics. We were encouraged to choose one topic from Statistics and the other from Operational Research. Today we will focus on the more statistical topic which I was introduced to by Emma Eastoe.

In statistics we are often interested in determining the most likely behaviour of a system. The usual way to do this would be to fit a model to the observations from the system. This can be done by finding a family of distributions that approximately describes the shape of the data. This family of distributions (or model) will have certain parameters. The observations can then be used to estimate the value of these parameters which maximises the probability of that set of observations occurring. In some situations however, the normal behavior of a system is of less concern to us and we are instead interested in the maximum (or minimum) outcome that we would expect to observe over an extended period of time. For example, if a local council is considering investment in flood defences they are not interested in the average height of the river but only in the events where the volume of water would exceed the river’s maximum capacity and cause flooding.

The problem here is that we are considering very unusual events that any distribution which was fitted to the entire set of observations would be unable to reliably estimate. We therefore require models that can be fitted to just the extreme events. There are two main approaches to consider: the Block Maxima Model and the Threshold Excess Model. Each of these approaches can by characterised by their different way of classifying an event as extreme.

Block Maxima Model: Here we partition the data into equal sections and then take the maximum data-point in each block to be an extreme event. The distribution of these maxima belongs to a specific family of distributions called the Generalised Extreme Value Family.
Threshold Excess Model: This approach considers all events that are above a certain threshold to be extreme. It can be shown that for a sufficiently high threshold these values will follow a Generalised Parito Distribution.

In both models we have an important decision to make. For the Block Maxima Model we must choose a block length and in the Threshold Excess Model we must set a threshold. These decisions play a very similar role in that they determine the number of points we have to fit our model to. If the block size is set too large or the threshold too high we will not have enough points to fit our distribution which will result in greater variance in the result. On the other hand if the block size is too small or the threshold too low the resulting points will not be well approximated by the Extreme Value or Parito distribution respectively.

Sometimes the data we are looking at is multidimensional. For example, if we want to describe extreme storm events for applications in shipping we may have data for wind and rain. These different variables may depend on each other or could be completely independent. Having more that one dimension imposes another difficulty – what do we want to consider as an extreme event. Do we need extreme values for wind and rain or is just one of the variables being extreme enough for an event to be considered extreme? Both the Block Maxima and Threshold Excess approaches can be extended to consider higher dimensions.

In my next post I will talk about my Operational Research topic: Optimal Patrolling.

Functional Data: Making height prediction less of a tall order

Ed Mellor — Wed, 05 Feb 2020 16:40:01 +0000

As part of the second term of STOR-i MRes programme we receive talks on a variety of potential research topics. One such talk, by from ��̽̽App, discussed the use of functional data. In this blog, I will explore one of the examples used in her presentation.

At 193cm I am the tallest member of the STOR-i programme — including both my fellow MRes student and all the PhDs! I was also taller than the vast majority of my peers during my undergraduate studies and the University of Exeter and one of the tallest people in my sixth form.

My parents recorded my height every year as I was growing up, so would it have been possible for my parents to use this information to predict how tall I would be as an adult?

The first thing they might have considered is their own heights. My dad is 182cm and my mum is 168cm. At ten years old I was 149cm so since they were both shorter at that age than I was, they might have (correctly) guessed that I would grow to be taller than both of them. To guess exactly how much taller is where things get more difficult.

If my parents had height data for other children as they grew into adulthood they could made a prediction about my future height by looking at the adult heights of different people that were a similar height at ten years of age. However, children grow at different rates and often two people who are exactly the same height as children may be very different heights as adults. In particular girls and boys tend to grow at different ages. Often girls are taller than boys at about eleven or twelve but do not tend to grow as much during their teenage years.

Instead of just considering a child’s height at a fixed time (for example at ten years old) we can instead look at their height each year up to adulthood. Note that although we only have a fixed number of observations we can fit a smooth line through these points to make a continuous function. We can therefore think of a child’s height as a function of time. So, for my height function we have f(10)=149cm and f(23)=193cm.

The figure below, kindly provided by Dr Park, shows the height functions for several individuals:

Since these functions are smooth, we can differentiate them to get a curve for the rate of growth. This is shown in our second figure below (also provided by Dr Park):

Here we show the child’s rate of growth with velocity in centimeters per year.

Although these functions are all different, we can notice some similarities. In each case, the child grows very quickly when they are very young and then growth gradually slows until they are about six. It then spikes again during puberty which happens once, usually between the age of eleven and seventeen. After this the rate of growth gradually drops to zero. Many of the curves also have several smaller peaks and troughs in various positions.

This leads us to the question: what does a normal growth curve look like?

If we are just looking at height at a particular age we can simply calculate the mean and can even produce some confidence interval for that estimation. But how do we find the mean for a function? A naïve approach would be to calculate a function, such that for any age, the value of the mean function is just the mean of all the functions, evaluated at that age. But since the peaks caused by puberty occur at different ages for different people, averaging in such a way would produce a much wider peak over multiple years that isn’t representative of a realistic growth rate.

What we can do instead is to first find the mean age for puberty (we will call this the structural mean) and scale each of the curves to fit this mean. Dr Park produced a graph that illustrates this step:

A function that is defined by taking the mean at each point in time of these new curves will now produce a much more realistic mean height function.

So how could my parents have used this to estimate my future height?

As a tall person it is unlikely that my growth curve would be particularly close to the average. This is where we need to consider my recorded heights up to the age of ten as well. Ideally we want the people to which we are comparing my height, to be as similar to me as possible as these people are likely to have a more similarly shaped height function. For example, only considering the average height of boys makes sense. Ideally we would also want to only consider boys whose parents were a similar height to mine and who were also a similar height to me at every age up to ten, although this may not be possible unless we have access to a lot of data.

Since, we only have my height function up to age ten we can then scale this average height function to match my data as closely as possible and then integrate in to find an estimate of my height.

Annual STOR-i conference 2020

Ed Mellor — Thu, 23 Jan 2020 14:01:58 +0000

As promised in my previous blog post I will be talking today about my first experience of an academic conference.

This year STOR-i hosted its ninth annual conference, with talks from a wide variety of speakers from the UK — including some of its own PhD students and alumni — and from overseas. We listened to 12 presentations, so for the sake of brevity I will mention all of them but only go into more detail for a few.

We were welcomed to the conference by Prof. Kevin Glazebrook who spoke a bit about STOR-i’s new round of funding and introduced the first speaker: from the .

Jacquillat spoke about analytics in air transportation. In particular he discussed how we can use air traffic flow management to absorb delays upstream by holding planes on the runway so they don’t get held up in the air, expending extra fuel as they wait for their turn to land. He also spoke about the benefits of adjusting existing integer programs for scheduling so that they are optimised to minimise passenger delays. This would give greater priority to larger flights and ensuring connecting flights arrive on time.

The second talk was by third year STOR-i PhD student, Henry Moss, who introduced us to a Bayesian optimisation method called MUMBO, which he has been developing for his thesis.

Next up was from the who talked about her work with the Mallows ranking model as well as some applications and recent advances in the area.

After lunch we were given two more talks. The first by , from the , and the other by Georgia Souli, a third year STOR-i PhD student, before another break for refreshments.

We came back to a presentation titled “The Use of Shape Constraints for Modelling Time Series of Counts” by from Columbia University.

, a STOR-i alumni then talked to us about his work at using machine learning to detect fraud. One of the major problems here is that machine leaning requires data to help it learn but as the nature of fraud changes the algorithm must be able to adapt. The difficulty here is that since all the obvious fraud attempts are blocked future iterations will have no experience of them and so will have difficulties detecting them. Flowerdew suggested that allowing suspected fraudulent transactions be completed with some small probability and then proportionally increasing the weight of these outcomes in the learning stage would allow the algorithm to learn more effectively and therefore prevent more fraud in the long run.

Tom Flowerdew at the STOR-i Conference

The final presentation of the day was: “Making random things better: Optimisation of Stochastic Systems” by from the .

We reconvened in the evening to look at posters made by the PhD students about each of their projects. This was a really good opportunity for them to develop their presentation skills by explaining their findings to knowledgeable academics in closely related fields. It was also an opportunity for us MRes students to learn a bit more about the research going on at the university and the sort of projects we might be interested in.

The following day we kicked off with a presentation by from . This focused on using the network of transactions between small and mediums sized businesses to improve credit risk models. Since transaction network data is difficult to get hold of, she also spoke about what approaches one can use without access to this data.

Next up was from who spoke about the balance between accuracy and interpretability for data science models and how this can be achieved.

Another STOR-i alumni, Ciara Pike-Burke, then talked about her recent work with multi-armed bandits. A multi-armed bandit can be thought of as a slot machine where pulling each arm will give a reward from some unknown distribution. The usual problem is balancing exploration to learn more about the different reward distributions for each arm while also trying to maximise the total rewards by exploiting the arm that is performing best. The reward distributions are usually constant but Pike-Burke considered the case where the rewards are dependent on the previous actions of the player. For example a company can suggest different products to a customer on their website and the reward is dependent upon whether the customer follows that link. If the customer has just bought a bed they are probably less likely to buy another bed. However, that same customer might be more likely to buy new pillowcases.

Finally from presented his talk on “Model Based Clustering with Sparse Covariance Matrices.”