STOR-i – Eleanor D'Arcy /stor-i-student-sites/eleanor-darcy Statistics PhD Student Fri, 05 Feb 2021 09:21:58 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /stor-i-student-sites/eleanor-darcy/wp-content/uploads/sites/6/2020/01/cropped-StoriMe-32x32.jpg STOR-i – Eleanor D'Arcy /stor-i-student-sites/eleanor-darcy 32 32 Life as a STOR-i MRes Student: Lent Term /stor-i-student-sites/eleanor-darcy/2020/05/01/life-as-a-stor-i-mres-student-lent-term/?utm_source=rss&utm_medium=rss&utm_campaign=life-as-a-stor-i-mres-student-lent-term Fri, 01 May 2020 09:37:58 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/eleanor-darcy/?p=423

To continue with my ‘Life as a STOR-i MRes Student’Ìý blogging thread, I have decided to reflect on my previous term (Lent term). This provides me with an opportunity to review the work and activities that I have been involved with between Christmas and Easter 2020. To find out more about life as an MRes student before Christmas, please read my previous post from this thread.

Upon returning from the Christmas break, I was reunited with my colleagues and friends at the STOR-i annual conference (see post). This was a great opportunity to network with key researchers from Statistics, Operational Research and Industry.

STOR-i Annual Conference 2020

Lent term focussed on independent work with a view to give us an insight into life as a PhD student. This involved completing two research projects working with an academic in the field. I worked on

  • Slot Scheduling in Air Transportation with Professor Konstantinos Zografos (see my blog post here)
  • Missing Data with Dr Robin Mitra
These projects provided a great opportunity to work with an expert in their field as well as carry out independent research.Ìý
Ed, Libby and I at the Tesco problem solving day

Another enjoyable part of Lent Term was the problem solving days. This involves a company visiting STOR-i with an industrial problem that may lend itself to either statistics and/or OR, then we work in teams with an aim to provide guidance on a solution. The three problem solving days were:

  • Tesco
    Machine Learning Predictions and Optimisation: Fuel PricingÌýÌý
  • BBC
    Temporal clustering
  • Electricity North West
    Assessing the Plausibility of Data
Masterclass with Prof. Brendan Murphy
Masterclass with Prof. Laura Albert

During Lent term we also had masterclass’ which involved external academics presenting work in their research area. This was another opportunity to network as well as learn about important areas in statistics and OR. The three masterclass’ were:

  • Model-based Clustering and Classification with Professor Brendan Murphy, University College DublinÌý
    (See the blog post I wrote on thisÌýhere)
  • Public Sector OR with Professor Laura Albert, University of Wisconsin Madison
    (See my relevant blog postÌýhere)
  • Bayesian Optimisation with Professor Peter Frazier, Cornell UniversityÌý
    (This was done virtually due to the coronavirus outbreak, many thanks to Peter for making this work)Ìý
Masterclass with Prof. Peter Frazier

Unfortunately the final masterclass (Ranking and Selection for Simulation Optimisation with Professor Barry Nelson from Northwestern University) was cancelled due to the outbreak of the COVID-19 pandemic. This world-wide emergency meant that Lent term came to an abrupt and premature end as we, like many others, were asked to work from home. This meant that many activities and deadlines were postponed. However, working from home has provided me with an opportunity to demonstrate a different style of learning. STOR-i have supported us in working from home by transferring much of our contact and planned events online.

My working from home set-up
A weekly, virtual, MRes catch up

Lent term also involved many social aspects as well. The ¶¶Òõ̽̽App Netball Team won their first (and only) match of the season. We had a term jam-packed with birthdays among the MRes, so there was plenty of celebration! I also went to the ¶¶Òõ̽̽App Undergraduate Conference where many of my friends in undergraduate degrees presented their work.

LU UG conference
¶¶Òõ̽̽App Grad Netball Team

Whilst Lent term ended in the strangest of circumstances, I appreciate the new working from home skills I have acquired and I feel very grateful for the online platform in which we have been able to continue as close to normal as possible. I thoroughly enjoyed Lent Term and I feel more equipped than ever to continue on my academic journey into a PhD.

]]>
STOR-i Masterclass: Professor Laura Albert /stor-i-student-sites/eleanor-darcy/2020/04/20/stor-i-masterclass-professor-laura-albert/?utm_source=rss&utm_medium=rss&utm_campaign=stor-i-masterclass-professor-laura-albert /stor-i-student-sites/eleanor-darcy/2020/04/20/stor-i-masterclass-professor-laura-albert/#comments Mon, 20 Apr 2020 08:50:18 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/eleanor-darcy/?p=372

Public Sector OR

At the end of February, Professor visited us at STOR-i to give a two day masterclass on Pubic Sector Operational Research. Laura is an Industrial and Systems Engineering Professor at the University of Wisconsin-Madison. At the time of the masterclass, Laura was on sabbatical in Germany at RWTH Aachen University. Her research focusses on applied optimisation in the public sector in the US; applications include homeland security, disasters, emergency response, public services and healthcare. Some current projects are:

  • Emergency medical service deployment and dispatch,
  • Cyber-security and trustworthy computing,
  • Next-generation policing models to divert opioid users from the criminal justice system.

Laura also authors the blogs and

History of Public Sector OR

The masterclass initiated with an introduction to Public Sector OR, detailing some of the historical applications. Following a period of civil unrest during the 1960s in the US, cities faced many challenges: crime, fire alarms. solid waste and drug use. Dr. Al Blumstein chaired the Commission’s Science and Technology Task Force (CMU) to address fundamental societal problems. With no extra money in the budget for public sector organisations, an increase in problem size meant there was only one solution: Operational Research. This is when the golden age of public safety research began.

Following this, some early contributions to public sector OR were made.ÌýMuch of this research was put into practice and influenced policy. These papers appeared in the best operations research journals and received major awards.

What is Public Sector OR?

Public Sector Operational research is a problem whose outputs are subject to public scrutiny

Public sector OR is concerned with complex systems that encompass people, processes, vehicles and critical infrastructure. It can include problems in the following areas:

  • Public health and safety
    Police, fire, emergency services and public health
  • Community development
    Planning, transportation
  • Human services
    Public assistance, welfare, drugs and alcohol treatment, homeless services
  • Nonprofit management
    Management of community-oriented service providers

Developing models to deal with these issues often involves multiple stakeholders or decision-makers and requires many objectives, often with conflicting aims. These models should aim to balance equity with efficiency, whilst remaining below some predetermined budget. Here are some examples of such models:

  • Food bank distribution networks,
  • Airport location or expansion using multi-criteria decision analysis,
  • Military procurement decisions,
  • Delivering relief aid,
  • Post-disaster reconstruction,
  • School bus schedules,
  • Public library location and management,
  • Undesirable facility location and management,
  • Public transport routes.

In the following sections, I will outline examples of public sector OR models that Laura presented during the masterclass

Small Scale: Facility Location Models

Suppose we want to site p ambulances at stations in a region to “cover” the most calls in 9 minutes. Here, there are two decisions to make: where to locate the stations and which calls are assigned to which station? This is modelled as an optimisation problem to achieve some balance between cost and service. Here, we maximise or minimise an objective subject to capacity constraints. Specifically, we consider a discrete problem where the locations are at predefined points using an integer program. In this problem, there are multiple distance criteria:

  • The total distance between calls and their assigned stations (this is usually demand weighted),
  • The maximum distance between a call and its assigned station,
  • The coverage – this is the number of calls covered if the distance is within some specified radius.

The model must also restrict the number of stations being built by considering the fixed cost associated with opening an ambulance station (including construction, leasing and labour costs). Remember: we want no more than p stations. Laura presented 5 models:

  • (Uncapacitated) Fixed-charge location problem:
    minimise fixed cost + demand weighted distance
  • P-median problem:
    minimise demand weights distance
    such that locate less than p stations
  • P-center problem:
    minimise maximum distance
    such that locate less than p stations
  • Set covering location problem:
    minimise number of stations
    such that cover all calls
  • Maximum covering location problem:
    maximise covered demands
    such that locate less than p stations

These models must also ensure that all calls are satisfied and calls are not assigned to a closed station. In order to cover the most calls in 9 minutes, the maximum coverage problem poses most appropriate. However, there are additional features that could be included to improve the model:

  • Different call volumes at different locations,
  • Non-deterministic travel times,
  • Each ambulance responds to the same number of calls,
  • Ambulances are not always available to backup coverage.

Even when these additional features are accounted for in the model, there still remains two sources of uncertainty: ambulance unavailability and probabilistic travel times. Models that incorporate both sources of uncertainty generate a configuration that covers up to 26% more demand at no extra cost.

Such facility location problems are not restricted to just ambulance station location but many other areas within the public sector:

  • Fire stations,
  • Airline hubs,
  • Blood banks,
  • Hazardous waste disposal sights,
  • Schools,
  • Bus stops.

Large Scale: Emergency Response for Homeland Security and Disaster Management

Laura also discussed applications within OR but on a much greater scale in terms of disaster management. Disasters can include those that are natural (e.g. earthquakes, droughts, tsunamis, etc.), terrorist induced (e.g. cyber attacks or nuclear blasts), technological and accidental (e.g. nuclear power plants or power outages). Disasters tend to follow a common lifecycle:

Ìý

Disaster Lifecycle

Each stage in the cycle (except vulnerability) lends itself to OR; we detail each stage and some applications:

  • Vulnerability is the potential for physical harm and social disruption.
    – Vulnerability does not typically lend itself to OR applicationsÌý
  • Mitigation includes actions taken prior to the disaster to prevent or reduce the impact.
    – Checkpoint screening for security
    – Network design
    – Pre-locating medical facilities and response stations
  • Preparedness also includes actions taken prior to a disaster but this time, to aid in response and recovery.
    – Pre-positioning crews and supplies in advance of a disaster
    – Evacuation planning
    – Emergency crew scheduling
  • Emergency response includes actions during and after a disaster to protect and maintain systems, rescue and respond to casualties and survivors, and restore essential public services.
    – Urban search and rescue
    – Routing and distribution of supplies and commodities
    – Hospital evacuation
  • Recovery includes efforts to reestablish pre-disaster systems and services.
    – Debris clean up and removal
    – Roads, bridge and facility repair and restoration
    – Replanting and restoration of forests and wetlands affected by a natural disaster

The model criteria of disaster models differ slightly from that of a standard model. Rather than quality, cost, profit, and distance, we are now concerned with loss of life, morbidity, coverage, and delivery of critical commodities.

I would like to thank Prof. Laura Albert for delivering this masterclass. I really enjoyed learning about different OR models applied to the public sector.

]]>
/stor-i-student-sites/eleanor-darcy/2020/04/20/stor-i-masterclass-professor-laura-albert/feed/ 1
STOR-i Masterclass: Professor Brendan Murphy /stor-i-student-sites/eleanor-darcy/2020/03/09/stor-i-masterclass-professor-brendan-murphy/?utm_source=rss&utm_medium=rss&utm_campaign=stor-i-masterclass-professor-brendan-murphy /stor-i-student-sites/eleanor-darcy/2020/03/09/stor-i-masterclass-professor-brendan-murphy/#comments Mon, 09 Mar 2020 15:21:00 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/eleanor-darcy/?p=337

Model-Based Clustering and Classification

A few weeks ago,ÌýÌývisited ¶¶Òõ̽̽App to present a two-day masterclass to all STOR-i students on Model-Based Clustering and Classification. Brendan is Full Professor and Head of School in School of Mathematics and Statistics at University College Dublin. His research interests include clustering, classification and latent variable modelling, particularly Brendan is interested in applications from social sciences, food science, medicine and biology. Currently, he is the editor for Social Sciences and Government for the Annals of Applied Statistics and he has recently co-authored a research monograph on Model-Based Clustering and Classification

Intro

Brendan kick-started the masterclass by providing an introduction to clustering. Cluster analysis aims to find meaningful groups in data in order to find clusters whose members have something in common that they do not share with members of other groups.ÌýClustering dates back to the beginning of language – at least – when objects were grouped according to common characteristics. For example, Aristotle classified animals into groups based on observations, in ‘History of Animals’ from the 4th century BC.

Hierarchical Clustering

In the 1950s, various hierarchical clustering methods were introduced. These aim to build a tree of clusters so that you start with n observations divided into n clusters (every observation is its own, individual cluster), then you find the two ‘closest’ clusters and group them so that there are now n-1 clusters, then you continue in this way until everyone is in a cluster. In order to do this, you need a measure of distance between observations (dissimilarity) and a measure of distance between clusters (linkage). The choice of these measures can heavily influence the results. Hierarchical clustering doesn’t always perform well even though it is commonly used.

K-means Clustering

Another method of clustering was developed in the late 1950s: k-means clustering. Here, we describe clusters by the average of the observations within it. This is an iterative algorithm repeated until convergence, split into two steps:

  • Allocation: assign observations to the cluster that is closestÌý
  • Update: the cluster summaries (i.e. the mean)

Brendan demonstrated k-means clustering in action, by clustering the colours on pixals in an image on Alexandra Square, ¶¶Òõ̽̽App. We start with a single cluster (k=1) and the results look pretty grey, as the number of clusters increases the photograph becomes more identifiable. Even with 2 clusters, buildings, shadows and people are all visible since light and dark areas have been separated. By the time we hit 10 clusters, the image is starting to look similar to the original and for 100 clusters, the image is indistinguishable from the original. 

k=1
k=2
k=3
k=10
k=20
k=100

Model-Based Clustering

The first successful model-based clustering method was also developed in the 1950s by Paul Lazarsfeld for multi-variate discrete data. The model he proposed is now known as the Latent Class Model – he used the term ‘latent’ for unknown cluster allocations. 

The dominant model for model-based clustering of continuous data was developed in 1963 by John Wolfe, this is known as the Gaussian Mixture Model. 

Model-based clustering assumes that observations arise from a finite mixture model and that each observation has a probability that it came from each group, g – these probabilities are called the mixing proportions. The data within each group is modelled and we can combine this model, with the mixing proportions, to define an overall model for the data. Many modes of estimating these models are available, Brendan focussed on the . 

A Gaussian mixture model models each observation as a multivariate Gaussian distribution. Therefore the clusters correspond to Gaussian densities and have elliptical shapes. We use the EM algorithm to fit these Gaussian mixture models. The example below fit these clusters in just 7 iterations of the algorithm.

Further Reading

Brendan recommended some further reading:

  • Geoffrey McLachlan and Kaye Basford
    Mixture Models: Inference and Applications to Clustering
  • Collins, Linda M and Stephanie Lanza
    Latent Class and Latent Transition Analysis
  • Paul McNicholas
    Mixture Model-Based Classification
  • Charles Bouveyron, Gilles Celeux, Brendan Murphy and Adrian Raftery
    Model-Based Clustering and Classification for Data ScienceÌý
Brendan is an author for the ‘mclust‘ package in R. This is used for model-based clustering, classification and density estimation based on finite Gaussian mixture-modelling fitted via the EM algorithm. This package had 1.5 million downloads in 2019!

This masterclass was my first and I really enjoyed learning about clustering and classification with Professor Brendan Murphy. I found the history, methods and applications really interesting and I am looking forward to reading further into the topic.

]]>
/stor-i-student-sites/eleanor-darcy/2020/03/09/stor-i-masterclass-professor-brendan-murphy/feed/ 1
STOR-i Annual Conference 2020: An Overview /stor-i-student-sites/eleanor-darcy/2020/01/28/stor-i-annual-conference-2020-an-overview/?utm_source=rss&utm_medium=rss&utm_campaign=stor-i-annual-conference-2020-an-overview /stor-i-student-sites/eleanor-darcy/2020/01/28/stor-i-annual-conference-2020-an-overview/#comments Tue, 28 Jan 2020 16:21:02 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/eleanor-darcy/?p=250 At the beginning of January, STOR-i CDT hosted its ninth annual conference where key researchers from Statistics, Operational Research and Industry presented a range of interesting talks. Additionally, STOR-i PhD students showcased their work during an evening poster session. This event provided opportunities to network with individuals from a variety of institutions and industries.    

initialised the conference by presenting his results from various analytical projects concerned with supporting the operations, scheduling and pricing in air transportation. Alexandre is an Assistant Professor of Operations Research and Statistics at the MIT Sloan School of Management. His research applies to areas in transportation with the aim to promote efficient scheduling, operations and pricing practices.

Henry Moss, a third year STOR-i PhD student, then proposed MUMBO (Multi-task Max-value Bayesian Optimisation), this is used to perform efficient optimisation (maximising or minimising a function subject to constraints) by evaluating low-cost functions related to our target function. Henry’s research lies at the intersection of Statistics and Operational Research, with a focus on Bayesian Optimisation.

The morning session concluded with a talk from , an Associate Professor at the Department of Biostatistics, this is part of the Oslo Centre for Biostatistics and Epidemiology, at the University of Oslo. Valeria presented some recent advances in a statistical model that works well in handling uncertainty in ranking and different kinds of data; the Mallows Rank Model. Her research spans several areas of Mathematics and Statistics;

  • Functional data analysis with applications in physiology and biostatistics,
  • Machine learning for describing people mobility,
  • Statistical genomics of cancer and high-dimensional data models

After the lunch break, discussed some lesser-known instances of his research in the theory, algorithms and applications of mathematical optimisation. Miguel holds the Chair of Operational Research at the School of Mathematics, University of Edinburgh. His research focuses on the application of optimisation to problems in power systems management and smart grid design.

, another third year STOR-i PhD student, proposed her procedures for generating valid linear inequalities that are added to optimisation problems in order to reduce the required solution time and improve solvers’ performance. Georgia also works with Morgan Stanley on a variety of large-scale optimisation problems.

Next, , the Howard Levene Professor of Statistics at Columbia University, delivered some of his research in time series of counts. He focussed on relaxing the assumption about the probability mass functions relating the observations to a state variable. Typically, this is chosen to be a Poisson or Negative Binomial distribution, but Richard detailed how to reduce this to a more general form and the consequences of this change.

Tom Flowerdew gave the penultimate talk of the day about applying a statistical learning model to the fraud detection processes. Tom is a STOR-i alumnus who now works for Featurespace, a world-leader in Adaptive Behavioural Analytics. They work for banks and other financial institutions by scoring transactions based on their risk of fraud. Tom explained some of the challenges with this and how statistical models are employed to tackle these problems successfully. To find out more about Featurespace, click .

To close the first day of the conference, presented two applications of optimisation where some elements of the system are subject to uncertainty. The examples discussed were concerned with the passenger transport industry and healthcare. Christine presented methods that account for the randomness of both inputs (e.g. demand for plane tickets or number of patients requiring a bed on a hospital ward) and outputs of a system (e.g. plane tickets sold or number of patients on a ward) when optimising another element (e.g. maximising revenue or minimising waiting time). Christine is Associate Professor of Operational Research in Mathematical Sciences and Director of CORMSIS (Centre for Operational Research, Management Science and Information Systems) at the University of Southampton.

, a reader in statistics at Brunel University London, started the second day by explaining how possible dependencies in a firms’ risk or default should be accounted for in commonly used credit risk models. Veronica explored how transaction data can be used for such models and the advantages that this may bring in terms of predictive power. She then proposed a model to capture dependencies as well as an algorithm to manage the high-dimensionality data and some computational challenges.

then proposed data science models that strike a balance between accuracy and interpretability so that they provide explanations on the task to the user who interacts with the models. Dolores is a Professor in Operations Research at Copenhagen Business School; her areas of expertise include Supply Chain Management, Data Mining and Revenue Management.

Another STOR-i alumnus presented next, who is a postdoctoral researcher at Universitat Pompeu Fabra in Barcelona. Ciara’s research interests include online learning, multi-armed bandits and reinforcement learning. In this talk, Ciara focussed on multi-armed bandits, where at each time step a player selects an action and receives some reward from selecting it, the aim is to maximise total reward. It is commonly assumed that this reward is constant, but Ciara proposed algorithms that perform well when the reward is not constant and depends on the history of the players’ actions.

concluded the conference by proposing a new method for model-based clustering of continuous data. Brendan is Full Professor and Head of School in the School of Mathematics and Statistics at University College Dublin. He has research interests in clustering, classification and latent variable modeling with applications from social sciences, food science, medicine and biology.

The STOR-i conference then closed with a lunch buffet which provided another opportunity to network with any external attendees. I really enjoyed listening to all of the talks and learning about many interesting applications of topics covered during the MRes so far. It was a great occasion to meet professors from other universities, industry professionals and STOR-i alumni as well as hearing from current students in both the presentations and the poster session. I intend to write further posts in the coming weeks that will discuss some of the talks in more detail.

]]>
/stor-i-student-sites/eleanor-darcy/2020/01/28/stor-i-annual-conference-2020-an-overview/feed/ 4
Life as a STOR-i MRes Student: Michaelmas Term /stor-i-student-sites/eleanor-darcy/2020/01/14/life-as-a-stor-i-mres-student-michaelmas-term/?utm_source=rss&utm_medium=rss&utm_campaign=life-as-a-stor-i-mres-student-michaelmas-term /stor-i-student-sites/eleanor-darcy/2020/01/14/life-as-a-stor-i-mres-student-michaelmas-term/#comments Tue, 14 Jan 2020 15:46:27 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/eleanor-darcy/?p=278

Introducing my ‘Life as a STOR-i MRes Student’Ìý blogging thread, starting with an overview of Michaelmas term. I decided to complete a blogging thread on my MRes year because it will provide me with an opportunity to reflect on my time as a student at STOR-i as well inform others about the course. I started the STOR-i MRes programme in October 2019, I have already learnt many new skills and topics and thoroughly enjoyed myself during the term.Ìý

In the first week of term, we started with an introductory day to meet some of the staff and students, as well as to learn what STOR-i is all about and what the programme entails. This was followed by a two-day excursion to the Lake District; this was a great opportunity to build relationships with my peers as well as staff and other students. The Away Day included many team-building activities, from designing a team logo and printing it onto a t-shirt to canoeing across Lake Windermere in an orienteering challenge. By the time we came back to Lancaster to start the term properly, we had really got to know each other and were able to support each other through the rest of the term.

Ìý

STOR-i Away Day 2019

The first half of Michaelmas term consisted of four modules to introduce us to the basic knowledge required for the rest of the year. This included Inference and Modelling, Stochastic Simulation, Deterministic Optimisation and Probability and Stochastic Processes. Within each topic, we had lectures, weekly exercises and workshops to assist with our understanding. Alongside these modules, we attended lectures regarding training for research and industry.

In the second half of term, we completed contemporary topic sprints on a weekly basis. This involved attending a lecture on a Monday morning to gain an overview of the topic and our task, then we would research the topic in teams for 4 days and present it back to an expert in the field later in the week. This was a great way to build on our teamwork, research and presentation skills. The topics were

  • Statistical Leaning for Decision lead by Professor David Leslie,
  • Modelling Paradigms for Complex and Novel Data forms lead by Professor Idris Eckley,
  • Computational Statistics lead by Professor Paul Fearnhead,
  • Optimising under Uncertainty lead by Professor Adam Letchford.

Following the sprints, we chose one topic to explore further and write a report on our findings. I chose Optimisation under Uncertainty and wrote my report on the Stochastic Knapsack Problem with Random Weights. This is an optimisation problem where we have a set of items with different, unknown weights but known reward. We want to choose a set of items to go in the knapsack (which has some maximum capacity) so that the total reward of the knapsack is maximised. Whilst writing this report, I read a lot of the relevant literature and broadened my knowledge on the topic as a whole.

Outside of my studies, I attended many of the events that STOR-i hosted, including a cheese and wine evening as well as the STOR-i Christmas Meal. This allowed me to meet more students and relax outside of studying. Additionally, I started playing netball for ¶¶Òõ̽̽App Graduate College, this has been a great way to meet other postgraduate students from different disciplines. I have thoroughly enjoyed the MRes programme so far and look forward to seeing what the Lent term brings.

STOR-i Christmas Meal
Netball Christmas Meal

Whilst Michaelmas term was busy and challenging, I have thoroughly enjoyed every moment in learning new topics and making new friends.

]]>
/stor-i-student-sites/eleanor-darcy/2020/01/14/life-as-a-stor-i-mres-student-michaelmas-term/feed/ 1