Category Archives: computational biology

Mathematical models for ion-channel cardiac toxicity: David v Goliath

This blog entry will focus on a rather long standing debate around model complexity and predictivity for a specific prediction problem from drug development. A typical drug project starts off with 1000’s of drugs for a certain idea. All but one of these drugs is eventually weened out through a series of experiments, which explore safety and efficacy, with the final drug being the one that enters human trials.  The question we will explore is around a toxicity experiment performed rather early in the development (weening out) process, which determines the drug’s effect on the cardiac system.

Many years of research has identified certain proteins, ion-channels, which if a drug were to affect could lead to dire consequences for a patient.  In simple terms, ion-channels allow ions, such as calcium, to flow in and out of a cell. Drugs can bind to ion-channels and disrupt their ability to function, thus affecting the flow of ions. The early experiment we are interested in basically measures how many ions flow across an ion-channel with increasing amount of drug.  The cells used in these experiments are engineered to over-express the human protein we are interested in and so do not reflect a real cardiac cell. The experiment is pretty much automated and so allows one to screen 1000s of drugs a year against certain ion-channels.  The output of the system is an IC50 value, the amount of drug needed to reduce the flow of ions across the ion-channel by 50 percent.

A series of IC50 values are generated for each drug against a number of ion-channels. (We are actually only interested in three.) The reason why a large screening effort is made is because we cannot test all the compounds in an animal model nor can we take all of them into man! So we can’t measure the effect of these drugs in real cardiac systems but we can measure their effect on certain ion-channel proteins which are expressed in the cardiac system we are interested in.  The question is then: given a set of IC50 values against certain ion-channels for a particular drug can we predict how this drug will affect a cardiac system?

As mentioned earlier, drug development involves performing a series of experiments over time. The screening experiment described above is one of many used to look at cardiac toxicity. The next experiment in the pipeline, which could occur one or maybe two years later, is exploring the remaining drugs in an intact cardiac system.  This could be a single cardiac cell taken from a dog, a portion of the ventricular wall, or something else entirely. After which, even less compounds are taken into dog studies before entering human trials. So the prediction question could be related to any one of these cardiac systems.  The inputs into the prediction problem are the set of IC50 values, three in the cases we will look at, whereas the output, which we want to predict, are certain measures from the cardiac systems described.

At this point some of you may be thinking, well if we want to predict what will happen in a real cardiac system then why don’t we build a virtual version of the system using a large mathematical model (biophysical model)? Indeed people have done this. However, others (especially those who follow this blog) might also be thinking, I have three inputs and one output and given we screen lots of these compounds surely the dynamics are not that difficult to figure out, such that I can do something simpler and more cost effective! Again people have done this too. If I were to refer to the virtual system (consists of >100 parameters) as Goliath and the simple model (3 parameters) as David some of you can guess what the outcome is! A paper documenting the story in detail can be found here and the model used is available online here.  I will just give a brief summary of the findings in the main paper.

The data-sets explored in the article involve making predictions in both animal studies and human.  Something noticeable about the biophysical models used in the original articles was that a different structural model was needed for each study.  This was not the case for the simple model which uses the same structure across all data sets.  Given that the simple model gave the same if not better performance than the biophysical models it raises a question: why do the biophysical modelling community need a different model for different studies? In fact for two human studies, A and B, different human models were used, why?  The reason may be that the degree of confidence in those models by people using them is actually quite low, hence the lack of consistency in the models used across the studies. Another issue not discussed by any of the biophysical modeling literature is the reproducibility of the data used to build such models. Given the growing skepticism of the reproducibility of preclinical data in science this adds further doubt to the suitability of such models for industrial use.

Given the points raised here (as well as a previous blog entry highlighting the misuse of these models by their own developers) can the biophysical modelling community be trusted to deliver a modelling solution that is both trustworthy and reliable? This is an important question as regulatory agencies are now also considering using these biophysical models together with some quite exciting new experimental techniques to change the way people assess the cardiac liability of a new drug.

Model Misuse: Applying hypothesis testing to simulated data from in-silico cardiac models

In the previous blog there was an interesting link to a report by Yaron Hollander on the use and abuse of models in transport forecasting.  His description of abuse of models can be seen in many sectors including the life sciences where it is arguably a bigger issue. Why? Other sectors have to some degree acknowledged the concept of structural uncertainty, which is a taboo subject for, most not all, modelers within the life sciences sector.  By acknowledging there is a problem modelers within the other sectors have at least moved beyond the denial phase, the first phase of an addiction problem.  This does not seem to be the case for most life sciences modelers.  A typical example of this can be seen in a recent article by Zhou et al. from the University of Oxford which explores the mechanisms, through use of modelling and simulation, behind certain biological phenomena in cardiac myocytes termed alternans (alternating long and short action potentials)…

In the article, Zhou et al., claim that the mathematical/computational model being used within the study is the “gold standard” and has been “extensively validated”.  Declaring a model as being the gold standard and extensively validated gives a licence to models being used to answer many questions which the model has not been tested for which will lead to all sorts of misuse of a model. Indeed the type of model used by Zhou et al. can never truly be tested due to its scale: 10’s of variables and 100’s of parameters.  Such large models, which also include extensive non-linear functions, are almost impossible to test because they are so flexible. Thus, using such models for the type of analysis Zhou et al. conducted can be considered a classic example of model misuse. The authors applied the following analysis (more detail can be found in the article):

  • A population of models is created by generating 10000 parameter sets by perturbing a subset of model parameters
  • Of these a subset (~2500) are deemed acceptable according to some criteria
  • Each of these parameter sets are then used to explore the alternan phenomena
  • Parameter sets are then grouped by how they answer the following questions:
    1. Does a parameter set produce alternans or not
    2. Are the alternans eye or folk type
  • Finally statistical tests are performed to ascertain whether the distributions of parameters are different between the groups created.

In essence they are applying statistical tests to simulated data, which has been discussed within ecology as something that should not be done.  White et al. provide two reasons why statistical significance tests should not be used to interpret simulation results of which the first is most relevant here as the second is more a philosophical debate to some degree.  The first reason revolves around power calculations: probability that a test correctly rejects the null hypothesis when the alternative is true. One of the key components of a power calculation is sample size! In brief, by using such a large sample size, numbers of simulations, Zhou et al. have powered their study to be able to detect the smallest of differences between groups.  Indeed Zhou et al. can control the sample size and thus control the results of a statistical test; they could be accused of p-hacking. This brings into question the results seen by Zhou et al. In addition to the misuse of statistical hypothesis testing there is another more worrying issue about the first step of the approach: using large flexible models to explain variability in a dependent variable, measured experimentally, by varying a subset of model parameters.  An obvious question is which parameters should be varied in such large models given how flexible they are? Furthermore, the bigger issue around structural uncertainty still hasn’t been addressed with such an approach.  What consequences could these issues have? It will lead to a high number of false positives and waste experimental resources chasing hypotheses that were not worthwhile.

Finally on an even more cautionary note, if the type of approach, described by Zhou et al., were used to develop biomarkers and to guide clinical trials then this is likely to increase clinical trial failure rates rather than improve them. In an era where people within the healthcare industry are looking at systems approaches, real care must be taken as to what approaches are actually used within the industry. As modelers our duty is to remain questioning and skeptical.

Complexity versus simplicity in relating tumour size change to survival in oncology drug development

Every pharmaceutical company would like to be able to predict the survival benefit of a new cancer treatment compared to an existing treatment as early as possible in drug development.  This quest for the “holy grail” has led to tremendous efforts from the statistical modelling community to develop models that link variables related to change in disease state to survival times.  The main variable of interest, for obvious reasons, is tumour size measured via imaging.  The marker derived from imaging is called the Sum of Longest Diameters (SLD).  It represents the sum of longest diameters of target lesions, which end up being large lesions that are easy to measure.  Therefore the marker is not representative of the entire tumour burden within the patient.  However, a change within the first X weeks of treatment in SLD is used within drug development to make decisions regarding whether to continue the development of a drug or not.  Therefore, changes in SLD have been the interest of most, if not all, statistical models of survival.

There are two articles that currently analyse the relationship between changes in SLD and survival in quite different ways across multiple studies in non-small cell lung cancer.

The first approach (http://www.ncbi.nlm.nih.gov/pubmed/19440187) by the Pharmacometrics (pharmaco-statistical modelling) group within the FDA involved quite a complex approach.  They used a combination of semi-parametric and parametric survival modelling techniques together with a mixed modelling approach to develop their final survival model.  The final model was able to fit to all past data but the authors had to generate different parameter sets for different sub-groups.  The amount of technical ability required to generate these results is clearly out of the realms of most scientists and requires specialist knowledge.  This approach can quite easily be defined as being complex.

The second approach (http://www.ncbi.nlm.nih.gov/pubmed/25667291) by the Biostatistics group within the FDA involved a simple plotting approach!  In the article the authors categorise on-treatment changes in SLD using a popular clinical approach to create drug response groups.  They then assess whether the ratio of drug response between the arms of clinical studies related to the final outcome of the study.  The outcomes of interest were time to disease progression and survival.  The approach actually worked quite well!  A strong relationship was found between ratio of drug response and the differences in disease progression.  Although not as strong, the relationship to survival was also quite promising.  This approach simply involved plotting data and can be clearly done by most if not all scientists once the definitions of variables are understood.

The two approaches are clearly very different when it comes to complexity: one involved plotting while the other required degree-level statistical knowledge!  It could also be argued that the results of the plotting approach are far more useful for drug development than the statistical modelling approach as it clearly answers the question of interest.  These studies show how sometimes thinking about how to answer the question through visualisation and also taking simple approaches can be incredibly powerful.