STOR-i – Libby Daniells

STOR-i Conference 2020: Alexandre Jacquillat on Airline Operations, Scheduling and Pricing

Libby Daniells — Tue, 31 Mar 2020 08:28:12 +0000

For this weeks blog post I wanted to branch out of the statistics field and into operational research. To do so I am going to focus on the talk given by Alexandre Jacquillat who opened the 2020 STOR-i Annual conference back in early January. For more details on the STOR-i Conference and an in depth look at Tom Flowerdew’s presentation on fraud detection please see my previous blog post.

Alexandre Jacquillat is an assistant professor of OR and Statistics at the MIT Sloan School of Management. His research focuses on applications in transportation systems to promote more efficient scheduling, operations and pricing using predictive and prescriptive analytics. This was the discussion point of his talk given at the 2020 STOR-i Conference, with a particular focus on the airline sector.

Alexandre Jacquillat at the STOR-i Anual Conference 2020

The work Jacquillat is doing is particularly vital as the transportation sector is transforming, with new technologies such as electric cars and ride sharing emerging, as well as, an increase in demand which is limited by capacity. His work aims to meet this rise in demand, whilst offsetting the costs of congestion.

In particular, the airline sector is a rapidly growing industry with a limited infrastructure. Most airlines are currently running at or above capacity which is what causes delays in departures and landings, incurs a cost to the airline and wastes valuable resources. The challenges surrounding airline operations, scheduling and pricing is a very current topic in the OR field, with the ultimate goal of improving efficiency and profitability in the industry.

I’ll be breaking this blog post down into the three sections discussed in Jacquillat’s talk: operations, scheduling and pricing within the airline sector.

Operations

The operations within airports are limited in capacity, as stated above, most airlines are operating at or above capacity which can cause severe delays. In order to balance capacity and demand and reduce delays and holding, Air Traffic Flow and Capacity Management (ATFCM) initiatives are implemented. One such initiative is the ground delay program, in which planes are held at departure airports if there is a delay to reduce cost and environmental impact, compared to waiting in the air for landing.

Jacquillat proposes modelling this as an optimization problem, in which our objective function is the cost of aircraft delay plus passenger delays constraint to: flight operating constraints, airport capacity constraints and passenger accommodation constraints. This aims to balance flight-centric vs. passenger-centric delays. It is important to consider both delay types, as there is not a direct correspondence between flight delays and passenger delays. Many passengers are on multi-flight itineraries where one small flight delay could cause them to miss multiple further flights and cause a large passenger delay.

Scheduling

Airlines are scheduled time slots for departures/arrivals through a request process. Some time slots are more in demand than others and this demand is often far greater than the capacity. Due to this, some airlines are not allocated their requested slot and instead a process needs to be implemented to ensure fairness in which alternate slot they receive. There is then a Slot conference in which allocated slots can be traded and changed.

This forms a complex slot allocation problem in which connection times and regularity of slots needs to be considered; but also, decisions regarding priorities in terms of historic slots (same airline allocated the same slot), new entrants and change-to-historic slots needs to be considered.

Jacquillat again proposed an optimization problem to allocate slots. To do so he suggested an objective function that minimizes the displacement of allocated slots from the airline’s original request. This objective is subject to the constraints:

Slot displacement: The difference between the slot time requested and those allocated.
Flight connections: The time to make a connecting flight is not over or under the allowable threshold i.e. there is enough time to make the connection or not too long to wait between connections.
Runway capacity: the allocation does not exceed the total number of flights that can depart or arrive at an airport during the time slot.
Terminal capacity: the number of people within the terminals waiting to depart or arriving does not exceed the allowed safety limits.

In order to solve this problem, Jacquillat suggested breaking the requests into subsets using an “Adaptive Destroy and Repair” approach which provides a relatively fast and high-quality solution. For more information about this method I would suggest reading the paper: “A Large-Scale Neighborhood Search Approach to Airport Slot Allocation” whose details I’ll leave in the references below.

During the conference, Jacquillat also presented an integrated approach to scheduling and traffic flow management that took into account the slot requests and airport capacity, as well as, passenger and aircraft itineraries. To do such a problem he proposed a two-stage stochastic integer programming model.

Pricing

The next thing to discuss was how to price airline tickets when compared to competitors. In general, the price of a ticket increases closer to the time of the flight, however a lead-in fare (the starting price) is made public. Good flights will move quickly up the price ladder, whereas less popular flights will stay at the lead-in fare for a longer period of time.

Jacquillat suggested a multi-control-group experimental design conducted on a global airline to test a new practice for airline pricing under competition. This is his current work that is in the process of being published.

I thoroughly enjoyed Alexandre Jacquillat’s presentation, which gave an insight into several solutions within a highly relevant application. I look forward to getting more exposure to OR topics and their applications in the future and in particular at the 2021 STOR-i conference. I’ve listed below some references and further reading into the topic. Thanks again Alexandre!

References & Further Reading

Ribeiro, N., Jacquillat, A., Antunes, A., (2019). A Large-Scale Neighborhood Search Approach to Airport Slot Allocation. Transportation Science 53(6).
Ribeiro, N., Jacquillat, A., Antunes, Odoni, A., Pita, J., (2018). An optimization approach for airport slot allocation under IATA guidelines. Transportation Research Part B: Methodological.

Multi-Armed Bandits: Thompson Sampling

Libby Daniells — Mon, 27 Jan 2020 09:00:00 +0000

For the STOR608 course here at STOR-i we covered several research areas as part of the topic sprints as discussed in my first blog post: “My First Term as a STOR-i MRes Student”. My favorite of these was the sprint lead by David Leslie titled Statistical Learning for Decision. For this I focused my research on the use of Thompson Sampling to tackle the multi-armed bandit problem. This will be the topic of this blog post.

Multi-armed bandits provide a framework for solving sequential decision problems in which an agent learns information regarding some uncertain factor as time progresses to make more informed choices. A common application for this is in slot machines, where a player must select one of K arms to pull.

A K-armed bandit is set up as follows: the player has K arms to choose from (known as actions), each of which yields a reward payout defined by random variables. These rewards are independently and identically distributed (IID) with an unknown mean that the player learns through experimentation.

In order to experiment, the player requires a policy regarding which action to take next. Formally a policy is an algorithm that chooses the next arm to play using information on previous plays and the rewards they obtained (Auer et al., 2002).

When making the decision on which arm to pull next, the policy must weigh up the benefits of exploration vs. exploitation. Exploitation refers to choosing an arm that is known to yield a desirable reward in short-term play. Whereas, exploration is the search for higher payouts in different arms, this can be used to optimize long term reward.

In order to quantify a policy’s success, a common measure known as regret is used. The regret of a policy over \(T\) rounds is defined as the difference between the expected reward if the optimal arm is played in all \(T\) rounds and the sum of the rewards observed over the \(T\) rounds.

Thompson sampling (TS) is one of the policies used for tackling multi-armed bandit problems. For simplicity we will consider a case where the rewards are binary i.e. they take the values 0 (for failure) or 1 (for success), however TS can be extended to rewards with any general distribution.

Consider a multi-armed bandit with \(k\in\{1,\ldots,K\}\) arms each with an unknown probability of a successful reward \(\theta_k\). Our goal is to maximize the number of successes over a set time period.

We begin by defining a prior belief on our success probabilities which we set to be Beta distributed. Each time we gain an observation we update this prior belief by updating the parameters of the beta distribution and use this updated distribution as a prior for the next arm pull.

The process for deciding which arm to pull at each time step using the Thompson Sampling algorithm is given in the following diagram:

Although here we have specified a Beta distribution, a distribution of any kind can be used. However, in this case it is a convenient choice as it’s conjugate to the Bernoulli distribution and hence the posterior is also Beta distributed. This is why we only need to update the parameters at each time step. If the prior chosen was not conjugate, then it would not be so simple. Instead we may be in a situation where we need to specify a completely new prior each time.

The prior chosen plays a large role in the effectivenss of the Thompson Sampling algorithm, and thus care must be chosen over its specification. Ideally it should be chosen to describe all plausible values of the reward success parameter.

In the slot machine case, the player often has no intuition on the values of this success parameter. This is usually described using an uninformative Beta(1,1) prior as it takes uniform value across the entire [0,1] range. This promotes exploration to learn which arm is optimal.

In the opposite case where we have some knowledge of the true values of the success parameter, we may center the priors about these values in order to reduce regret. Here, we require less exploration to identify the optimal arm and so can exploit this to maximize reward.

Although TS tends to be efficient in minimizing regret, there are occasionally outlier cases where the regret is much higher than expected. This may happen if we begin by pulling the sub-optimal arm and receive a successful reward. We are likely to exploit this further and continue to pull this arm, falsely leading us to believe this arm is actually optimal. Or alternatively, we may begin by selecting the optimal arm and observe a failure reward. We then may exploit the sub-optimal arm under the false belief that it will give us a higher reward.

To conclude, Thompson Sampling is a simple but efficient algorithm for solving stochastic multi-armed bandit problems, successfully balancing exploitation and exploration to maximize reward and minimize regret.

References & Further Reading

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-time analysis of the mulitarmed bandit problem. Machine Learning, 47(2):235-236.

Russo, D., Van Roy, B., Kazerouni, A., Osband, I., and Wen, Z. (2017). A tutorial on Thompson Sampling.