Uncategorized – Conor Murphy /stor-i-student-sites/conor-murphy STOR-i PhD Student Mon, 31 Jan 2022 14:37:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 /stor-i-student-sites/conor-murphy/wp-content/uploads/sites/20/2021/01/cropped-Asset-7-32x32.png Uncategorized – Conor Murphy /stor-i-student-sites/conor-murphy 32 32 Dependence in Extremes /stor-i-student-sites/conor-murphy/2021/04/30/dependence-in-extremes/?utm_source=rss&utm_medium=rss&utm_campaign=dependence-in-extremes Fri, 30 Apr 2021 14:16:22 +0000 /stor-i-student-sites/conor-murphy/?p=477 In my last post, I briefly discussed the two standard approaches to modelling of extremes events, block maxima and threshold models. Both models were introduced with the assumption of observations being independent and identically distributed. This assumption of temporal independence is unrealistic as most extreme events occur over several consecutive observations. This may make you question the appropriateness of the two models I have mentioned previously and rightly so!

Stationarity

Stationarity is more realistic for many physical processes. This allows variables to be mutually dependent but the stochastic properties remain the same throughout time. So, the distribution of X_1 is the same as that of X_{41}. When extreme observations of a process exhibit short-range dependence, i.e. several consecutive observations are classified as extreme, they are said to form a cluster. In order to fit the models previously mentioned, we need some way to obtain observations from clusters which we can then deem independent. This is called declustering.


A more precise definition is as follows:

Declustering corresponds to a filtering of the dependent observations to obtain a set of threshold excesses that are approximately independent.


Inference for clusters of extreme values of a time series usually requires the identification of independent clusters of exceedances over a high threshold. There are many different methods used to “decluster” a stationary series. The choice of declustering scheme can have a significant effect on estimates of within-cluster characteristics and return levels.

A common approach is to identify independent clusters of exceedances above a high threshold, to evaluate the characteristic of interest for each cluster and to form estimates from these values. The two most common methods used to identify clusters are blocks and runs declustering. Runs declustering, for example, assumes that exceedances belong to the same cluster if they are separated by fewer than a certain number of observations below the threshold, known as the run length. The potential problem then arises in the selection of this run length parameter as this choice is somewhat arbitrary.

Peaks over Threshold

The standard approach to declustering is called Peaks over Threshold (POT). Once a definition of clusters has been decided, the maximum excess in each cluster is recorded and these cluster maxima are then assumed to be independent. The GPD can then be fitted to these independent cluster maxima.

However, there are some problems with this approach.

  • It is sensitive to the choices made in the definition of a cluster, i.e threshold and run length selection.
  • Again, as with the block maxima approach, there is a significant wastage of information by only considering the maximum of each cluster in the modelling procedure.
  • This method can also incur significant bias in the estimates of parameters and subsequently, in the return level estimates.
  • Cluster characteristics cannot be investigated when working with this filtered set of independent extremes.

Return levels

Return level estimation is very important when modelling extreme data. The n-return level is the level we expect to exceeded every n observations. These return level estimates are used in the design specifications of preventative measures such as sea-walls and thus, significant bias in these estimates could cause serious problems. Damage can occur that could’ve been prevented such as the flood damage at Newlyn, pictured below.

Flood damage at Newlyn. Photo: itv.com

An Interesting Approach

One method that really caught my eye which aims to reduce this bias to negligible levels is a declustering scheme developed by Lee Fawcett and David Walshaw. The authors take a rather unusual approach to dealing with the dependence present in the series. They, simply, ignore it!

Initially, Fawcett and Walshaw propose continuing with the incorrect assumption of independence between exceedances of a high threshold. They use all threshold exceedances in the modelling procedure, dealing with the problem of waste in other methods. This approach of using all exceedances also manages to significantly reduce the bias in parameter and return level estimates.

The only problem with this approach is that due to the incorrect assumption of independence, standard errors associated with model parameters are underestimated. To deal with this, the authors use a method developed by R.L. Smith, which allows them to inflate standard errors to more realistic levels using a relatively simple adjustment.

Thus, the authors manage to deal with the problem of dependence by initially, ignoring it and accounting for it afterwards. This approach works well where the goal is to improve return level estimation.

The authors also developed another approach in which they used all threshold exceedances again but on top of that, explicitly modelled the temporal dependence structure with the goal of improving estimation of cluster characteristics. This approach adds a large degree of complexity to the modelling process and so is really only justified in the case where cluster characteristic are of interest.

Conclusion

There are many approaches to deal with dependence in time series extremes, some more complicated than others. I think the goal of the researcher should dictate the types of methods used. If the goal is to improve return level estimation, then the simpler approach of ignoring dependence and accounting for it later seems like a good idea. However, if the aim is to gain insight into the dependence structure of a process and look at within-cluster characteristics, then explicit modelling of dependence might be needed. I will reference another interesting approach below which is more concerned with cluster characteristics. The approach developed by Christopher A. T. Ferro and Johan Segers, relies on a particular cluster characteristic called the extremal index and is based on the limiting distribution of times between exceedances.

As I’ve said, there are many other methods out there with a variety of goals which help to deal with the added complexity caused by the removal of the unrealistic assumption of temporal independence in extreme observations. I have only briefly outlined one approach which looked to improve on the standard POT method. I hope you’ve enjoyed reading this post and gotten a taste for how dependence is dealt with in an EVT setting.

Thanks for reading! References below and as always, feel free to leave a comment or contact me through the contact form if you’d like to discuss any part of this.

– Ferro & Segers

– Fawcett & Walshaw

– Fawcett & Walshaw

]]>
Extremes /stor-i-student-sites/conor-murphy/2021/04/12/extremes/?utm_source=rss&utm_medium=rss&utm_campaign=extremes Mon, 12 Apr 2021 20:14:12 +0000 /stor-i-student-sites/conor-murphy/?p=463 In this post, I’d like to talk about a very unique discipline in statistics called Extreme Value Theory (EVT). Throughout my undergraduate degree, it seemed that all courses in statistics were concerned with modelling the “usual”. For the most part, this is true as statisticians in many disciplines are typically concerned with the behaviour of data on average. Why extremes is so unique is that it is looking to model the unusual. EVT looks at family of distributions which help us to gain insights into the most rare of events such as floods, earthquakes, heatwaves and more. EVT can use historical data to provide a framework which can estimate the most extreme anticipated forces that may impact upon a designed structure. Clearly, this becomes very important when designing preventative measures against such events. In this post, I’m going to give a brief introduction into EVT.

The Problem

Of course, extreme events don’t happen often and so, often what’s needed is an estimate of events more extreme than what has already occurred. This involves prediction of unobserved levels based on observed levels. As an example for the need for this extrapolation, suppose a new sea-wall is required in Newlyn, Cornwall to protect against such events as pictured above which occur with extremely high sea levels. This wall may need to protect against any extreme sea levels which may occur in, say, the next 100 years. However, we may only have access to 10 years worth of historical data for the area. Thus, the problem is to estimate the sea-levels which may occur in the next 100 years based on the last 10 years of data. EVT provides families of models which allow for such extrapolation.

Classical Extremes

Continuing with the sea-levels example, suppose we have X_1, X_2, \ldots a sequence of 3-hourly sea-surge heights at Newlyn. We assume X_1,\ldots,X_n are independent and identically distributed random variables and let M_n = \max \{X_1,\ldots,X_n\} be the maximum sea-surge over n observations. These are called block-maxima and the family which is used to model these maxima is called the Generalized Extreme Value (GEV) distribution which (if you’re interested) is given by:

G(z) = \exp\left\{-\left[1+ \xi \left(\frac{z-\mu}{\sigma}\right)\right]^{-1/\xi}\right\}

defined on \{z: 1+\xi(z-\mu)/\sigma > 0\}, -\infty < \mu < \infty, \sigma > 0 and -\infty < \xi < \infty.

Now, there are a few problems with this approach. Since we’re looking at blocks of n observations, say, monthly or annual maxima, there is the obvious problem that some blocks may have a larger number of extreme observations than others and there is a chance that some of these extra extreme observations could actually be larger than the maximum of another block but since they are not the maximum of the block in which they lie, they are not included in the modelling procedure. Clearly, there is a significant wastage of data here.

There are clear problems with defining extreme events as the largest observations which occur in a block. Thus, we need a more flexible way of defining extreme events.

Threshold Models

Threshold models provide this flexibility. We now denote events as extreme if they exceed some high threshold u. Exceedances of a high threshold are then said to follow another distribution known as the Generalized Pareto Distribution (GPD) which (for those interested) is defined as follows:

For large enough u, the distribution function of (X-u) conditional on X > u is approximately:

H(y) = 1 - \left(1 + \frac{\xi y}{\tilde{\sigma}}\right)^{-1/\xi}

defined on \{y: y > 0 \hspace{0.3cm}\text{and}\hspace{0.3cm} (1 + \xi y/\tilde{\sigma}) > 0\}, where \tilde{\sigma} = \sigma + \xi(u-\mu).

This approach clearly provides a much more flexible approach for the definition of extreme events as well as reducing the wastage of data.

What’s next?

So far, with both of these models, we have been assuming an underlying sequence of independent observations. However, when thinking of extreme events such as high sea-levels or storms, it’s clear that these wouldn’t occur in single observations. For example, in the case of the 3-hourly sea-surge measurements at Newlyn, if an extreme event occurred, i.e the sea level was very large , we would not expect the sea-level to return to normal after a single observation. It’s more likely to be high over a number of observations. Thus, the assumption of independence between observations is unlikely to be valid. In fact, for most type of data where EVT is applied, independence in time is unrealistic.

In the next post, I will discuss the use of stationary sequences to approximate this short-term dependence between observations in time series extremes.

I hope you enjoyed this brief introduction to the two main approaches to modelling in EVT. Thanks for reading!

]]>
Data Farming /stor-i-student-sites/conor-murphy/2021/03/30/data-farming/?utm_source=rss&utm_medium=rss&utm_campaign=data-farming Tue, 30 Mar 2021 15:10:30 +0000 /stor-i-student-sites/conor-murphy/?p=455 This post is a follow-on from one of my earlier blog posts titled “Efficient Experimental Design”. If you haven’t read it, you can find it here. In the earlier post, I discussed the benefits of efficient experimental design and outlined some simple examples. In this post, I’ll be discussing the concept of data farming and the benefits of taking a data farming approach when designing an experiment.

Introduction

Data farming is a descriptive metaphor that captures the notion of generating data with the intention of maximizing the information gained from a simulation model. Data mining may be a more familiar term when thinking of “big data”. Susan Sanchez uses a very interesting metaphor to compare these two concepts.

If you think of miners in the real-world, they search for valuable ore buried in the earth but have no control of what is there or where it lies. As they work, they gain information about the geology of the earth and can use this to improve future efforts. Data mining follows the same idea.
Now, thinking about real-world farmers who nurture land to maximize their overall yield. They manipulate the land using farming techniques in order to increase their overall gain. Experiments can then be conducted to assess if these techniques are effective in this goal. Data farmers follow a similar process in that they manipulate simulation models to maximize their overall (information) yield and “grow” their data in this way to facilitate the identification of useful characteristics of a model.

The term “data farming” is also used in non-simulation contexts still as a metaphor for a method of dealing with big data. Data farming in an industrial setting has been quoted as a means for “enhancing data on hand and determining the most relevant data that need to be collected” (Sanchez 2020). It is also widely used in healthcare settings. In this setting, it has been stated that the goal of data farmers should be “to examine how best to use the tools available in our electronic systems to increase the volume of actionable data that are readily available”.

So, clearly, data farming is used in a variety of concepts and can be described in many different ways but all descriptions seem to allude to the same goal, to “grow” the data available to provide more useful insights. In this post, I will mainly discuss data farming in a simulation context as a natural follow-on from my simulation experiment design post.

A bit more detail…

The basic experimental design concepts discussed in my earlier post can be very useful but greater insights can be gained when, rather than restricting the experiment to small designs, a large-scale, data farming approach is applied where space-filling designs such as Latin Hypercubes (LH) are used from the outset.

A space filling design, in simple terms, describes a design which has points everywhere in the experimental region with as few gaps as possible. The example of a LH achieves this but the explanation of the inner workings of a LH is probably beyond the scope of this post. The important point is that the LH as well as many other designs exhibit good space-filling properties. If you’d like to learn more about designs of experiments such as the LH, you can find descriptions of many designs in the paper by Susan Sanchez referenced at the end of the post.

In the context of a simulation optimization (SO) experiment, where the method will look to find some global optimum under some set of conditions using techniques such as stochastic gradient descent or stochastic trust-region methods, considerations must be made for the choices of a number of factors such as number of samples, direction and size of each step as well as number of repetitions of the procedure. One potential negative aspect in this context is the required computational cost as SO methods can take a very long time to converge to an optimum especially if superfluous input factors are included.

Rather than trying to optimize the solution, a data farming approach fits some metamodel to the response surface and uses this to guide the optimization attempts. The resulting dataset has then been ”grown” to encompass the behaviour of the response(s) over a range of factors of interest. Metamodels can provide assessments of which inputs are key drivers of a simulation and allow us to ignore superfluous factors. They can also show whether non-linear or interaction effects exist and may reveal other characteristics of the response surface. Large space-filling designs can also provide diagnostics such as lack-of-fit assessments. These models can also be used to identify undesirable solutions and the reasons for the poor performance which aids the understanding of the robustness of the system.

Clearly, one of the major benefits here is that not only do we get a desirable solution, we gain further information about the general behaviour of the simulation model and better understand why this particular solution works well. Another benefit of having large amounts of data from a designed data farming experiment is that this limits the chances of spurious findings which can cause problems when working with observational big data.

There are a number of ways in which a simulation analyst can incorporate data farming concepts into a study to gain extra insight from the time-consuming task of building and validating a simulation model. All of these design methods require some extra effort in advance of the experiment to put together the ”computational nuts and bolts needed to automate the data farming process” and timeliness is, of course, important in this context. Efficient DOE is a necessity once we decide to explore more than a handful of factors but if factor levels must be changed manually, then the analyst’s time becomes more of a concern than computation time. However, setting up a data farming environment from the outset could be very worthwhile allowing us to automate the run generation process and grow data without the worry of input errors.

The ”nuts and bolts” of a data farming process, in summary, involve the following steps:

  • Identify all input requirements for the model.
  • Choose a suitable design for the system and appropriate range of variation for the factors.
  • For each design point, modify the base design factor values to the current design point; execute the model with these settings; extract suitable output measures (if needed); collate the design point specification with the output measures and append to prior run results.
  • Repeat previous step for the desired number of replications.
  • Use statistical tools to analyze output.

Conclusion

The ”nuts and bolts” of setting up a data farming experiment may require some additional time and effort but this extra effort is reduced when modelling platforms are built with data farming in mind. I may have taken the view of a simulation analyst here but both the simulation and data analytics communities stand to benefit from gaining additional insight and growing understanding in a timely manner.

I hope you enjoyed this post and as usual, feel free to leave a comment or contact me through the contact from here if you’d like to discuss. If you’d like to read further into DOE or data farming, some references are below. Thanks for reading!

– S. M. Sanchez & H. Wan

– S. M. Sanchez

– S. M. Sanchez & P. J. Sanchez

]]>
The Monty Hall Problem /stor-i-student-sites/conor-murphy/2021/03/12/the-monty-hall-problem/?utm_source=rss&utm_medium=rss&utm_campaign=the-monty-hall-problem Fri, 12 Mar 2021 19:37:14 +0000 /stor-i-student-sites/conor-murphy/?p=399 The Monty Hall problem came up briefly in a research talk my MRes cohort attended in January. This apparently paradoxical problem causes debate and confusion, even with mathematicians and is widely misunderstood. Even with a statistical background, it’s sometimes a difficult task to intuitively solve the problem. During the research talk, the speaker asked us (a group of MRes students in a statistical discipline) “which is the better strategy, switch or stay?” and even though, most (if not all) of us would have come across the problem before, none of us were able to give a definite answer with proper reasoning behind it. A few of us gave the correct answer (switch) from memory but the intuition as to why this is the correct solution was still lacking. So, following that talk, I decided to go on a little search for intuition in the hope that the next time this problem comes up, there’s no confusion in my mind.

The Problem

The Monty Hall problem is based on a scenario from the show Let’s Make a Deal where the host (Monty) shows you three doors and tells you that behind two of the doors await hungry goats, while behind the third door sits a brand new car. If you choose correctly, you go home with a brand new car, and otherwise, you get stuck with a goat. Monty asks you to choose a door. After you make your choice, Monty opens a different door and shows you a goat. Now, with only two doors left, you’re given the chance to switch. What should you do?

Figure 1: Three doors, which do you choose?

For someone who hasn’t seen the problem before, a common response, even among mathematicians, is to say that there is no difference between switching and not switching. Two doors left, each has a probability of 1/2 of having the car, right? This response is understandable as we knew there was a goat behind one of the other two doors anyway so we don’t have any new information, right? Thus, intuitively, it’s easy to see why someone might argue that each remaining door has a 50\% chance. This is incorrect. The correct choice is actually to switch, and with this strategy, the odds of finding the car double! Let me explain why.

Explanation

A common misunderstanding here is that, after you’ve made your initial choice, Monty opens up a door at random. However, what actually happens is: Monty, knowing what’s behind every door, opens up a door to show you a goat. Now, there’s a couple of ways of looking at the problem once you take this into account. First, let’s start by assuming we play this game a large number of times and we now want to choose the best strategy to use throughout all of the games. Clearly, staying with your initial choice will work 1/3 of the time since the probability of the car being behind that door is unaffected by the new information. There’s only two strategies to use so if staying will only work 1/3 of the time, then switching must work 2/3 of the time.

Now, this may seem like weak logic and you may still argue each of the doors left has an equal chance of having the car behind it. But the key here is that Monty knows what’s behind all of the doors!

So let’s look at this again. Once you’ve made your initial choice, say door 1, there is a 1/3 chance that the car lies behind door 1 and a 2/3 chance that it lies behind either of the other two doors. Now, Monty opens, for example, door 2 and shows you a goat. At this stage, door 1 still has a 1/3 chance but with door 2 having been opened, door 3 now holds the 2/3 probability of having the car. So, we might say that the probability has now been “concentrated” on the only other door left leaving it with double the probability of having the car! So, clearly, we should always switch as this will allow us to win two thirds of the time.

Figure 2: Stay or Switch?

Larger Scale

To make this a bit clearer, let’s imagine we, instead, have one hundred doors from which to choose:

Figure 3: 100 doors, which do you choose?

Suppose you still make the same choice of door 1. Now, there’s a probability of 99/100 that the car lies behind one of the other doors.

Following this, Monty shows you 98 doors with goats behind them. So, what would you do in this scenario? Immediately, this makes more intuitive sense. Since Monty knows which door holds the car and has shown you 98 doors without it, you, of course, should switch to the one door he avoided opening. Following the same process, door 1 which you initially chose has a 1/100 chance of having the car while the one other door that’s left unopened, say door 97, has a 99/100 chance!

Figure 4: Goats everywhere!!!

Conclusion

The conclusion from this post is simply, ALWAYS SWITCH.

Switching is clearly the better strategy. As demonstrated above, if you played the game a large number of times, the switching strategy would win approximately 67\% of the time whereas sticking with your initial choice would win only 33\% of the time. So, the former strategy doubles your chances of winning a nice new car!

Hopefully, this post has cleared up this debate-inducing problem. Feel free to leave a comment or reach out to me through the contact form here if you’d like to discuss it.

Thanks for reading!

]]>
Efficient Experimental Design /stor-i-student-sites/conor-murphy/2021/02/26/efficient-experimental-design/?utm_source=rss&utm_medium=rss&utm_campaign=efficient-experimental-design Fri, 26 Feb 2021 16:40:04 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/conor-murphy/?p=367 In this blog post, I’d like to talk about the importance of experimental design in simulation studies. Simulation studies are widely used in modern scientific research as they attempt to model real-life situations. Models can be very complex and can include a huge range of factors and sources of uncertainty. Design of Experiments (DOE) may seem like an obvious step when constructing a simulation experiment but the extra effort involved in constructing an efficient design can be off-putting and thus, sometimes, the benefits of DOE are not utilised. DOE, in essence, is about building your simulation experiment in the most suitable way possible for the situation.

Motivation

To demonstrate the importance of DOE, we will look at the developments in supercomputer capability in the last decade or so. In June 2008, a supercomputer called the “Roadrunner” was unveiled. With a cost of $133 million, this machine was capable of doing a petaflop (a quadrillion operations per second). Five years later, China took the position as the world leader with the “Tianhe 2” supercomputer which had a capability of over 33 petaflops. Today, the “Fugaku” supercomputer built in Japan holds the title with a capability of over 440 petaflops. Even with these massive capabilities, efficient DOE is still extremely important. Suppose we want to run a simulation with 100 factors, each with two levels (low and high) of interest. Looking at all possible combinations of the 100 factors and assuming that each of the 2^{100} runs consisted of a single machine operation, a single replication would take over 40 million years on the “Roadrunner”, over 1.1 million years on the “Tianhe 2” and even with the massive capability of the current world-leader, a single replication would still take over 90 millenia.

Clearly, no-one has 90000 years to wait for the results from a simulation experiment! With some simple screening designs, the computation time can move from millenia on a supercomputer to days or even hours on an “everyday” computer.

Example

Factorial designs are a simple starting point when looking at DOE. They explore all possible combinations of the factor levels. The simplest design (mentioned above) is the 2^k factorial design which requires only two levels for each factor. In DOE, the idea is to construct a design matrix which represents all of the combinations of factors being assessed in the experiment. In the design matrix, every column corresponds to a factor and the entries within a column correspond to levels of the factor. Each row of the matrix represents a combination of factor levels called a design point.

For example, let’s look at the 2^3 case, where we are assessing 3 factors, each with 2 levels “low” (denoted by -1) and “high” (denoted by +1). The first column of the design matrix alternates between -1 and +1 , the second column alternates -1 and +1 in groups of 2 and the third in groups of 4 (with higher numbers of factors, this would continue in the same manner by powers of 2). This equates to sampling at the corners of a hypercube, as below. The columns in the matrix correspond to X_1, X_2 and X_3 respectively. The factorial design can also look at interactions between these factors. An interaction is added to the design matrix by simply multiplying the corresponding columns of individual factors in the interaction.

2^3 factorial design

As demonstrated above, this design can be computationally expensive when the number of factors is large. Finer grids which incorporate more levels for each factor also have the same problem.

One variation of this design that can be useful is a fractional factorial design. This approach samples at chosen fractions of the corner points of a hypercube. Below is a graphical depiction of a 2^{3-1} fractional factorial design. In this case, three factors are examined again, each at two levels but the number of design points (rows in the matrix) is only 4 instead of 8. Two points lie on each of the left and right faces of the cube with each face having one instance of X_2 and X_3 at each level allowing us to isolate the effect of X_1. Averaging results for the top and bottom faces isolates the effect of X_2 and the front and back faces allow us to look at X_3.

2^{3-1} fractional factorial design

The gains in efficiency are massive! Going back to the supercomputer example, running the 2^{100} full factorial design would take over 40 million years on the “Roadrunner”. However, running a 2^{100-85} fractional factorial design with only 32768 design points on the same machine would take less than a second!

Conclusion

This has only scratched the surface of experimental design but some of the benefits of efficient DOE are evident. There are many more examples of potential designs and much more insight to be gained when an experiment is designed in the most suitable way for the data and research questions we want to answer. If you’re interested in reading more about DOE, the paper below details many more examples of designs and their strengths/weaknesses. Thanks for reading!

– S. M. Sanchez & H. Wan

]]>
The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant /stor-i-student-sites/conor-murphy/2021/02/02/the-difference-between-significant-and-not-significant-is-not-itself-statistically-significant/?utm_source=rss&utm_medium=rss&utm_campaign=the-difference-between-significant-and-not-significant-is-not-itself-statistically-significant Tue, 02 Feb 2021 13:48:59 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/conor-murphy/?p=341 This short paper caught my eye recently when scouring the internet for something interesting to (attempt to) explain clearly in my first blog post. When I initially read the title, I was a bit shocked as thoughts rushed through my mind such as “All of those modules where I learned about p-values and statistical significance never mentioned this fairly crucial fact!”. After a few breaths, I began to read it and of course, realised the paper is not discounting this widely-used method of determining the validity of a variable in a model but it is simply making known a common error often made when using this method for comparisons.

Introduction

This common statistical error comes about when comparisons are summarised by declarations of statistical significance and results are sharply distinguished between “significant” and “not significant”. The reason this is important is that changes in statistical significance are not themselves statistically significant. The significance level of a quantity can be changed largely by a small (non-significant) change in some statistical quantity such as a mean or regression coefficient.

Quick Example

As a simple example, say, we have run two independent studies in different areas to determine the number of days/nights people had spent inside in the last month when compared to the same month in 2019, i.e looking at the effect of lockdown/Covid-19 on the amount of days/nights a person spends inside. Now, say, we obtained effect estimates of 27 in study 1 and 12 in study 2 with respective standard errors of 12.5 and 12 . The first study would be statistically significant while the second would not. A naive conclusion to make here but one that might be tempting is to declare that there is a large difference between the two studies. Unfortunately, this difference is certainly not statistically significant with an estimated difference of 15 and standard error of \sqrt{12.5^2 + 12^2} = 17.3 .

In the paper, they also explain how it can be problematic to compare estimates with different levels of information. Say, there was another independent study conducted with a far larger sample size and the effect estimate obtained was 2.7 with a standard error of 1.2 . This study could now obtain the same significance level as in study 1 but the difference between the two is significant! If we focussed just on significance, we might say study 1 and 3 replicate each other but looking at the effect estimated, clearly this is not true.

This is dangerous as “significance” often aids decision making and conclusions could be made based on the first study while disregarding the second, when actually the two don’t differ significantly from one another. As the paper explains, one way of interpreting this lack of statistical significance is that further information might change the conclusion/decision.

Conclusion

In essence, the paper is urging one to err on the side of caution when interpreting significance. Essentially, comparing statistical significance levels is not a good idea and one should look at the significance of the difference and not the difference in significance.

I hope you found this post interesting. If you’d like to read the full paper, see the link below and feel free to leave a comment (even if just to say you never want to hear the word “significance” again!).

– Andrew GELMAN and Hal STERN

]]>
Hello world! /stor-i-student-sites/conor-murphy/2020/11/09/hello-world/?utm_source=rss&utm_medium=rss&utm_campaign=hello-world Mon, 09 Nov 2020 16:14:26 +0000 http://www.lancaster.ac.uk/stor-i-student-sites/conor-murphy/?p=1 Welcome to my blog! This is my first post or more of a placeholder for my first post.

In the blog, I’ll include posts about academic interests, research I’m conducting and papers of interest as well as non-academic interests and experiences throughout my time at STOR-i and beyond.

I hope you enjoy!

]]>