Clinical Trials – Libby Daniells

Coronavirus: A summary of some of the findings so far

Libby Daniells — Wed, 15 Apr 2020 09:10:47 +0000

As of the time of writing (14/4/2020) a total of 11,329 people in the UK and 120,863 worldwide are known to have died from the novel Covid-19 virus. This is an unprecedented global pandemic that has reached almost all corners of the globe. This is a new virus having just emerged in Wuhan, China in late 2019, that attacks the lungs causing pneumonia like symptoms. Many epidemiologists are attempting to model the outbreak and the impact of various intervention plans. Every day on Twitter I see new papers and statistics on the pandemic emerge, the aim of this blog post is to summarize some of the papers I’ve read and analyse some of the statistics that have been released.

All statistical models rely on assumptions and because of this they will never be 100% accurate. They make assumptions on how many people will become infected, the number of cases that will require hospitalization and whether this will exceed NHS intensive care capacity. Weiss (2020) suggests the most basic model to use is the SIR model – where the population are split into three subgroups: susceptible people, i.e. those vulnerable to getting the disease; infected people and removed people (those who have recovered and gained immunity or those who have died) – but with an added category for carriers. This extra category is required as a person can carry the disease for up to 14 days without showing any physical symptoms. This means people can pass the disease on to the vulnerable without knowing they were even sick. This is why social distancing measures are deemed so vital in preventing the spread of this virus. In reality much more complex models are required to fully describe the viral outbreak, however, regardless of the complexity it will never fully describe what will happen in real life. Models are used simply as an “informed prediction” based on the data and the population; they are used to make decisions regarding the interventions to be put in place and when they can be lifted.

As this is a new virus, modelling it’s spread is challenging. This is because assumptions need to be made but there is only a small amount of data available. These assumptions are used for key model parameters (Enserink & Kpferschmidt (2020)). One such parameter is the number of new infections caused by one infected person when no intervention measures are put in place, as well as the time frame in which these infections occur. It takes some time and a large quantity of data for these parameters to be accurately estimated.

Coronavirus and the UK: The Numbers

Modelling by epidemiologists in Imperial College London was used to determine the intervention approach the UK government implemented. It was believed that if a hard lock-down was implemented like in China, Italy and Spain then the infection rate would spike once the intervention was lifted (Enserink & Kupferschmidt (2020)). Because of these beliefs, at first less severe social distancing restrictions were implemented to ensure the peak of infections was flattened and the demand for intensive care beds within hospitals did not exceed capacity. However, taking into account new data, a revised model was created and it was decided that more strict lock-down measures were required to save the NHS from being overwhelmed. These new measures were announced on 23rd March 2020. Due to the nature of the disease, it could take up to a month to determine whether these measures are sufficient.

Some of the data used to model the virus outbreak was obtained from the “BBC Pandemic project” (Klepac et. al (2020)) which collected data from over 36,000 UK citizens in order to create age-specific population contact matrices. This data was used to reduce the amount of social contact in order to reduce the spread of the virus. The details of this paper are listed at the bottom of this page and I highly recommend reading it. The study worked through an app downloaded by participants. The app recorded their approximate location hourly for a day, at the end of which the users recorded all social contacts, providing information on each. This study does come with the flaw of self-reporting and the misinformation that comes alongside that, however it does allow for a massive sample size to more accurately portray the UK population.

Below are three graphs that depict: 1. The cumulative number of cases, 2. The cumulative number of deaths and 3. The daily number of deaths, all of which relate to the UK only. According to Roser et. al (2020), under current death rates, it would take 7 days for the total number of confirmed deaths to double. This is one of the worst growth rates in the world, only behind that of the US and Belgium (both of which have a 6 day doubling rate). The first two graphs imply exponential growth, however we do expect this to level off and reach some sort of peak in the coming weeks.

Cumulative number of cases of Covid-19 in the UK. Data sourced from

Cumulative number of deaths from Covid-19 in the UK. Data sourced from

Total number of deaths from Covid-19 in the UK per day. Data sourced from

This data has to be taken with a pinch of salt as it may not all be up to date, for example there is a lag between testing and the results being obtained. To add to this, according to Richarson and Spieglhalter (2020) just over 317,000 have been tested in the UK to date, compared to the 1.3 million tests carried out in Germany, this may mean that a greater number of people have had the virus but have not had it confirmed. There are also the aforementioned carriers who will not yet know they were infected as they do not present with symptoms. So is the increase in growth caused by an increases in cases or an increase in testing? The answer to this is likely both.

It is also thought that the number of deaths is much higher than what is being reported. This is because the numbers released only include those who have died in hospital and have tested positive for Coronavirus, there is often a delay of a few days or more for the death to be recorded as being caused by Covid-19.

Singapore Case-Control Study

In this section of the blog, I’m going to summarize one of many studies currently being carried out around the globe into the Coronavirus pandemic. The study I will focus on was conducted by Sun et. al (2020). It investigated risk factors on the virus using a case-control study in Singapore between 26th January and 16th February 2020. 54 cases of Coronavirus were compared to 734 control cases. The data collected included: demographic, co-morbidity factors, exposure risk, symptoms and vitals (including blood pressure, pulse and temperature). Predictors of the virus were split into four categories:

Exposure Risk
Demographic Variables
Clinical Findings
Clinical Test Results (some patients presented all clinical tests, others just radiological tests)

From this they created four prediction models, whose variables were selected using stepwise AIC to create logistic regression models:

Model 1: all covariates from all 4 categories,
Model 2: demographic variables, clinical findings and all clinical test results,
Model 3: demographic variables, clinical findings and clinical test results excluding radiology,
Model 4: only demographic variables and clinical findings.

From this study, they found that positive cases of Covid-19 were more likely to be older compared to the controls (with a p-value less than 0.0001) but they were not more likely to have any of the co-morbidity factors than any of the controls (this is an unusual finding as the UK government listed a set of conditions such as diabetes, asthma and heart disease make a person more vulnerable to the disease; I would have thought this would have shown up in the co-morbidity results). However, the exposure factor was deemed significant with 59.3% of cases having had contact with someone with the virus or having recently traveled to Wuhan, compared to only 17.2% of the controls. Cases were also deemed more likely to have a fever (p-value of 0.003) and signs of pneumonia through radiology results (present in 42.6% of cases compared to 11.1% of controls).

From Model 1 it was deemed that exposure risk was most significant in resulting in a positive Covid-19 result. In the other 3 models, which exclude exposure, a high temperature is deemed the most relevant clinical finding in predicting a positive result apart from in Model 2 where Gastrointestinal symptoms were deemed marginally more significant.

It was concluded that Model 1, which takes into account all risk factors, performs exceptionally well in predicting a positive Coronavirus status but even with an absence of exposure status the models 2 and 3 performed sufficiently. The evidence did however show a reduce in performance in using model 4, where basic clinical tests such as bloods were not used. For more information on this study I would recommend reading the paper by Sun et. al (2020) listed in the references below.

Concluding Remarks

Although modelling is very useful in analyzing the spread of Coronavirus and decision making for intervention practices, there is a lot that these models will not show us such as: the degree to which the public comply to social distancing intervention measures, the introduction of a vaccine as well as the toss-up between saving the economy and reducing the death rate. As all models will contain some degree of uncertainty, they must be analysed for pitfalls and decisions should not be based solely on their findings.

References & Further Reading

Klepac, P., Kucharski, A., et al. (2020). Contacts in context: large-scale setting-specific social mixing matrices from the BBC Pandemic project. medRxiv
Enserink, M., Kuperschmidt, K., (2020). With COVID-19, modelling takes on life and death importance. Science (New York, N.Y.) 367(6485)
Richardson, S., Spiegelhalter, D., (2020). Coronavirus statistics: what can we trust and what should we ignore? The Observer
Roser, M., Ritchie, H., Ortiz-Ospina, E., (2020). Coronavirus Disease (COVID-2019) – Statistics and Research, Our World In Data
Sun, Y., Vanessa, K., et. al (2020). Epidemiological and Clinical Predictors of COVID-19, Clinical infectious diseases: an official publication of the Infectious Disease Society of America
Weiss, S. (2020). Why modelling can’t tell us when the UK’s lockdown will end, Wired

What is a Meta-Analysis? The benefits and challenges

Libby Daniells — Mon, 09 Mar 2020 19:59:19 +0000

My my last blog post focused on how to analyse a single clinical trial with multiple treatment arms. But what if we want to consider results for multiple trials studying a similar treatment effect?

During my research for my MSci dissertation (see the last blog post to find out the basics of my research) I came across the concept of meta-analysis. The overall motivation for conducting meta-analyses is to draw more reliable and precise conclusions on the effect of a treatment. In this blog post I will outline for you both the benefits and costs of conducting a meta-analysis.

Meta-analysis is a statistical method used to merge findings of single, independent studies that investigate the same or similar clinical problem [Shorten (2013)]. Each of these studies should have been carried out using similar procedures, methods and conditions. Data from the individual trials are collected and we calculate a pooled estimate (although data is not usually pooled!) of the treatment effect to determine efficacy.

Effectiveness vs. Efficacy: ‘Efficacy can be defined as the performance of an intervention under ideal and controlled circumstance, whereas effectiveness refers to its performance under real-world conditions.’ [Singal, A. et al. (2014)].

If conducted correctly, efficacy conclusions from a meta-analysis should be more powerful due to the larger sample size created by considering several studies. Often this sample size is far greater than what we could feasibly achieve in a single clinical trial, which is constraint by funds and resources including the availability of patients. This increased sample size also improves the precision of our estimate in terms of how closely the trial results relate to effectiveness in the whole population [Moore (2012)].

Although meta-analysis can be a useful tool to increase sample size and hence statistical power, it does have significant associated methodological issues. The first of these is publication bias. This may be introduced because trials which show significant results in favor of a new treatment are more likely to be published than those which are inconclusive or favor the standard treatment. Another form of publication bias arises when researchers exclude studies that are not published in English [Walker et al. (2008)]. This exclusion of studies may lead to an over/under-estimate of the true treatment effect.

The issue of publication bias in a meta-analysis exploring the effects of breastfeeding on children’s IQ scores was discussed by Ritchie (2017). A funnel plot of the original data set showed a tendency for larger studies to show a smaller treatment effect, indicating publication bias. The original study found that breastfed children had IQ scores that were on average, 3.44 points higher than non-breastfed children. However, after adjusting for publication bias a much lower estimate of 1.38 points higher IQ was given. Although this is still a significant result, this example highlights the issue of overestimation resulting from publication bias.

Funnel Plots: A funnel plot is a method used to assess the role of publication bias. It is a plot of sample size versus treatment effects. As the sample size increases, the effect size is likely to converge to the true effect size [Lee, (2012)]. We will obviously have a scatter of points surrounding this true effect size, however, if we have publication bias, we may have a lack of ‘small effect size’ small studies included in the meta-analysis. This will lead to the funnel plot being skewed.

Another key issue with meta-analysis is heterogeneity. This is defined as the variation in results between studies included in the analysis. Investigators must consider the source of this inconsistency, they may include differences in trial design, study population or inclusion/exclusion criteria between trials but also differences due to chance. High levels of heterogeneity comprises the justification for meta-analysis, as grouping together studies whose results vary greatly will give a questionable pooled treatment effect, thus reducing our confidence in making recommendations about treatments. There are methods to handle heterogeneity, one of which is to fit a random effects model.

A meta-analysis considering strategies to prevent falls and fractures in hospitals and care homes [Oliver et al. (2007)] obtained strong evidence to suggest heterogeneity between studies. This variation was highlighted in forest plots which showed a very wide spread of results. As the investigators believed all trials were similar enough in design and all aimed to trial the same treatment, they felt it was justified to calculate a pooled treatment effect. However, this high variability brings into question the reliability of the estimate, complicating decisions regarding recommendation of treatments.

In summary, Meta-analysis is a very useful tool for combining results of studies in order to boost the precision of our conclusions. We do however need to proceed with caution and ensure heterogeneity and check for publication bias.

References & Further Reading

Lee, W., Hotopf, M., (2012). Critical appraisal: Reviewing scientific evidence and reading academic papers. Core Psychiatry, 131-142.
Moore, Z. (2012). Meta-analysis in context. Journal of Clinical Nursing, 21(19):2798-2807.
Oliver, D., Connelly, J. B., Victor, C. R., Shaw, F. E., Whitehead, A., Genc, Y., Vanoli, A., Martin, F. C., and Gonsey, M. A. (2007). Strategies to prevent falls and fractures in hospitals and care homes and effect of cognitive impairment: systematic review and meta-analyses. BMJ, 334(7584).
Ritchie, S. J. (2017). Publication bias in a recent meta-analysis on breastfeeding and IQ. Acta Paediatrica, 106(2):345-345.
Singal, A., Higgins, P., Waljee, A. (2014). A primer on effectiveness and efficacy trials. Clinical and Translational Gastroenterology, 5(1).
Shorten, A. (2013). What is meta-analysis? Evidence Based Nursing, 16(1).
Walker, E., Hernandez, A., and Kattan, M. (2008). Meta-analysis: Its strengths and limitations. Cleveland Clinic Journal Of Medicine, 75(6):431-439.

Simultaneous Inference in Clinical Trials

Libby Daniells — Thu, 20 Feb 2020 09:00:00 +0000

As part of my undergraduate study I completed a dissertation titled “Simultaneous Inference in Clinical Trials” supervised by Dr Fang Wan. In particular I focused on the construction of simultaneous confidence intervals for treatment effects. In this post I wanted to share with you a brief overview of some of my findings.

Within clinical trials it is common to test multiple hypotheses simultaneously. This is referred to as multiple comparisons. An example of where this arises is in biomarker trials. Biomarkers are measurable indicators of biological conditions that can be used to identify target populations within trials.

Example – Biomarker trials have the following set up: as patients are enrolled onto the trial their biomarker status is identified; these results are used to stratify individuals into two groups: biomarker positive or biomarker negative. From here, patients are allocated treatments using randomization. In this case we consider a two-treatment scenario in which we are testing a new treatment (T) against a control (C). This control is usually either an existing treatment which we wish to show is inferior or equivalent to the new treatment, or it is a placebo, which is a drug identical to T but lacks the active agent. The stratification process is illustrated in the figure below. This design creates four subgroups within which we simultaneously estimate the size of the treatment effect (i.e. the result of a specific treatment within a subgroup).

We will be focusing on trials with binary endpoints i.e. the treatments were either deemed a success or failure. Therefore, our parameter of interest is the proportion of times the treatment was successful.

Before we delve into the issues associated with multiple comparisons we need definitions of the error rates that will play a significant role in simultaneous testing.

Type I Error: Occurs when we reject a true null hypothesis. The type I error rate is the probability of making a Type I Error. This is equivalent to the significance level which we often set to be $\alpha$=0.05.

Type II Error: Occurs when we accept a false null hypothesis. The type II error rate is the probability of making a Type II error.

Confidence Interval: A confidence interval (CI) is constructed to demonstrate the degree of uncertainty surrounding a parameter estimate. These intervals have confidence level $1-\alpha$ when a large number of intervals are constructed and $100(1-\alpha)$% contain the true parameter value.

Family-Wise Error Rate (FWER): The probability of rejecting at least one true null hypothesis.

When conducting simultaneous hypothesis tests, we are often interested insuring that the significance level for a whole group of hypothesis is $\alpha$, rather than just at the individual test level. To do so we need to control the FWER to be approximately $\alpha$.

If we are testing $k$ independent hypothesis simultaneously, their overall FWER is given by $$\text{FWER}\;=\;1-\mathbb{P}(\text{No true null hypothesis rejected})\;=\;1-(1-\alpha)^k.$$ Therefore, as the number of hypothesis increases, the error rate rapidly tends to 1, meaning the probability of making at least one type I error rises to an unacceptable level as demonstrated by the following figure.

I will now discuss two ways in which to correct for multiple testing.

Bonferroni Correction

When using Bonferroni correction, we set the significance level for an individual hypothesis test to $\alpha/k$ where $k$ is the total number of hypotheses being tested simultaneously. We reject the $i^{th}$ hypothesis when the p-value is less than $\alpha/k$.

With Bonferroni applied, the family-wise error rate is kept equal to, or below $\alpha$. As the FWER can be below the desired significance level, we call it conservative. It becomes increasingly conservative as the number of hypotheses increases (as seen in the figure below).

If the independence assumption for the FWER is not met, Bonferroni could become extremely conservative. A family-wise error rate below 0.05 will lead to a greater number of null hypotheses being accepted despite being false. Thus, we’ve improved the type I error rate at the expense of the type II error rate.

Sidak Correction

Like Bonferroni, Sidak correction involves adjusting the significance level of an individual test in order to control the FWER. This time we set the significance level for an individual hypothesis test to $1-(1-\alpha)^{1/k}$.

When testing several independent hypotheses simultaneously the use of Sidak correction ensures that the FWER is exactly $\alpha$. This is a more powerful method than Bonferroni as it is always less conservative. However, this is reliant on the fact that the hypotheses are independent. If any dependencies do arise Sidak can be overly liberal and produce a FWER greater than $\alpha$, giving an unacceptably high probability of making a type I error.

Because of this reliance on the independence assumption, we tend to prefer the Bonferroni method. To add to this, despite Sidak having greater statistical power, the improvement over Bonferroni is minimal.

Within my dissertation I used these methods and others to construct several different types of simultaneous confidence intervals, determining which was optimal under varying conditions. Although I won’t go into the ins and outs of this in this blog posts, I will share that Bonferrnoi correction with a Wilson Score interval did give optimal coverage and length properties, regardless of the sample size, number of hypotheses being tested and the value of the parameter estimate.