Category Archives: computational biology

Model Misuse: Applying hypothesis testing to simulated data from in-silico cardiac models

In the previous blog there was an interesting link to a report by Yaron Hollander on the use and abuse of models in transport forecasting.  His description of abuse of models can be seen in many sectors including the life sciences where it is arguably a bigger issue. Why? Other sectors have to some degree acknowledged the concept of structural uncertainty, which is a taboo subject for, most not all, modelers within the life sciences sector.  By acknowledging there is a problem modelers within the other sectors have at least moved beyond the denial phase, the first phase of an addiction problem.  This does not seem to be the case for most life sciences modelers.  A typical example of this can be seen in a recent article by Zhou et al. from the University of Oxford which explores the mechanisms, through use of modelling and simulation, behind certain biological phenomena in cardiac myocytes termed alternans (alternating long and short action potentials)…

In the article, Zhou et al., claim that the mathematical/computational model being used within the study is the “gold standard” and has been “extensively validated”.  Declaring a model as being the gold standard and extensively validated gives a licence to models being used to answer many questions which the model has not been tested for which will lead to all sorts of misuse of a model. Indeed the type of model used by Zhou et al. can never truly be tested due to its scale: 10’s of variables and 100’s of parameters.  Such large models, which also include extensive non-linear functions, are almost impossible to test because they are so flexible. Thus, using such models for the type of analysis Zhou et al. conducted can be considered a classic example of model misuse. The authors applied the following analysis (more detail can be found in the article):

  • A population of models is created by generating 10000 parameter sets by perturbing a subset of model parameters
  • Of these a subset (~2500) are deemed acceptable according to some criteria
  • Each of these parameter sets are then used to explore the alternan phenomena
  • Parameter sets are then grouped by how they answer the following questions:
    1. Does a parameter set produce alternans or not
    2. Are the alternans eye or folk type
  • Finally statistical tests are performed to ascertain whether the distributions of parameters are different between the groups created.

In essence they are applying statistical tests to simulated data, which has been discussed within ecology as something that should not be done.  White et al. provide two reasons why statistical significance tests should not be used to interpret simulation results of which the first is most relevant here as the second is more a philosophical debate to some degree.  The first reason revolves around power calculations: probability that a test correctly rejects the null hypothesis when the alternative is true. One of the key components of a power calculation is sample size! In brief, by using such a large sample size, numbers of simulations, Zhou et al. have powered their study to be able to detect the smallest of differences between groups.  Indeed Zhou et al. can control the sample size and thus control the results of a statistical test; they could be accused of p-hacking. This brings into question the results seen by Zhou et al. In addition to the misuse of statistical hypothesis testing there is another more worrying issue about the first step of the approach: using large flexible models to explain variability in a dependent variable, measured experimentally, by varying a subset of model parameters.  An obvious question is which parameters should be varied in such large models given how flexible they are? Furthermore, the bigger issue around structural uncertainty still hasn’t been addressed with such an approach.  What consequences could these issues have? It will lead to a high number of false positives and waste experimental resources chasing hypotheses that were not worthwhile.

Finally on an even more cautionary note, if the type of approach, described by Zhou et al., were used to develop biomarkers and to guide clinical trials then this is likely to increase clinical trial failure rates rather than improve them. In an era where people within the healthcare industry are looking at systems approaches, real care must be taken as to what approaches are actually used within the industry. As modelers our duty is to remain questioning and skeptical.

Complexity versus simplicity in relating tumour size change to survival in oncology drug development

Every pharmaceutical company would like to be able to predict the survival benefit of a new cancer treatment compared to an existing treatment as early as possible in drug development.  This quest for the “holy grail” has led to tremendous efforts from the statistical modelling community to develop models that link variables related to change in disease state to survival times.  The main variable of interest, for obvious reasons, is tumour size measured via imaging.  The marker derived from imaging is called the Sum of Longest Diameters (SLD).  It represents the sum of longest diameters of target lesions, which end up being large lesions that are easy to measure.  Therefore the marker is not representative of the entire tumour burden within the patient.  However, a change within the first X weeks of treatment in SLD is used within drug development to make decisions regarding whether to continue the development of a drug or not.  Therefore, changes in SLD have been the interest of most, if not all, statistical models of survival.

There are two articles that currently analyse the relationship between changes in SLD and survival in quite different ways across multiple studies in non-small cell lung cancer.

The first approach ( by the Pharmacometrics (pharmaco-statistical modelling) group within the FDA involved quite a complex approach.  They used a combination of semi-parametric and parametric survival modelling techniques together with a mixed modelling approach to develop their final survival model.  The final model was able to fit to all past data but the authors had to generate different parameter sets for different sub-groups.  The amount of technical ability required to generate these results is clearly out of the realms of most scientists and requires specialist knowledge.  This approach can quite easily be defined as being complex.

The second approach ( by the Biostatistics group within the FDA involved a simple plotting approach!  In the article the authors categorise on-treatment changes in SLD using a popular clinical approach to create drug response groups.  They then assess whether the ratio of drug response between the arms of clinical studies related to the final outcome of the study.  The outcomes of interest were time to disease progression and survival.  The approach actually worked quite well!  A strong relationship was found between ratio of drug response and the differences in disease progression.  Although not as strong, the relationship to survival was also quite promising.  This approach simply involved plotting data and can be clearly done by most if not all scientists once the definitions of variables are understood.

The two approaches are clearly very different when it comes to complexity: one involved plotting while the other required degree-level statistical knowledge!  It could also be argued that the results of the plotting approach are far more useful for drug development than the statistical modelling approach as it clearly answers the question of interest.  These studies show how sometimes thinking about how to answer the question through visualisation and also taking simple approaches can be incredibly powerful.