Category Archives: computational biology

Oxford Quantitative Systems Pharmacology (QSP): Is there a case for model reduction?

The title of this blog-post refers to a meeting which I attended recently which was sponsored by the UK QSP network. Some of you may not be familiar with the term QSP, put it simply it describes the application of mathematical and computational tools to pharmacology. As the title of the blog-post suggests the meeting was on the subject of model reduction.

The meeting was split into 4 sessions entitled:

  1. How to test if your model is too big?
  2. Model reduction techniques
  3. What are the benefits of building a non-identifiable model?
  4. Techniques for over-parameterised models

I was asked to present a talk in the first session, see here for the slide-set. The talk was based on a topic that has been mentioned on this blog a few times before, ion-channel cardiac toxicity prediction. The latest talk goes through how 3 models were assessed in their ability to discriminate between non-cardio-toxic and cardio-toxic drugs across 3 data-sets which are currently available. (A report providing more details is currently being put together and will be released shortly.)  The 3 models used were a linear combination of block (simple model – blog-post here) and 2 large scale biophysical models of a cardiac cell, one termed the “gold-standard” (endorsed by FDA and other regulatory agencies/CiPA – [1])and the other forming a key component of the “cardiac safety simulator” [2].

The results showed that the simple model does just as well and in certain data-sets out-performs the leading biophysical models of the heart, see slide 25.  Towards the end of the talk I discussed what drivers exist for producing such large models and should we invest further in them given the current evidence, see slides 24, 28 and 33. How does this example fit into all the sessions of the meeting?

In answer to the first session title: How to test if your model is too big?  The answer is straightforward: if the simpler/smaller model outperforms the larger model then the larger model is too big.  Regarding the second session on model reduction techniques – in this case there is no need for these. You could argue, from the results discussed here, that instead of pursuing model reduction techniques we may want to consider building smaller/simpler models to begin with. Onto the 3rd session, on the benefits of building a non-identifiable model, it’s clear that there was no benefit in this situation in developing a non-identifiable model. Finally regarding techniques for over-parameterised models – the lesson learned from the cardiac toxicity field is just don’t build these sorts of models for this question.

Some people at the meeting argued that the type of model depends on the question, this is true, but does the scale of the model depend on the question?

If we now go back to the title of the meeting, is there a case for model reduction? Within the field of ion-channel cardiac toxicity the response would be: Why bother reducing a large model when you can build a smaller and simpler model which shows equal/better performance?

Of course, within the (skeptical) framework of Green and Armstrong [3] point out (see slide 28), one reason for model reduction is that for researchers it is the best of both worlds: they can build a highly complex model, suitable for publication in a highly-cited journal, and then  spend more time extracting a simpler version to support client plans. Maybe those journals should look more closely at their selection criteria.


[1] Colatsky et al. The Comprehensive in Vitro Proarrhythmia Assay (CiPA) initiative — Update on progress.  Journal of Pharmacological and Toxicological Methods. (2016) Volume 81 pages 15-20.

[2] Glinka A. and Polak S. QTc modification after risperidone administration – insight into the mechanism of action with use of the modeling and simulation at the population level approach.  Toxicol. Mech. Methods (2015) 25 (4) pages 279-286.

[3] Green K. C. and Armstrong J. S. Simple versus complex forecasting: The evidence. Journal Business Research.  (2015) 68 (8) pages 1678-1685.

Hierarchical oncology combination therapy: which level leads to success?

The traditional preclinical combination experiment in Oncology for two drugs A and B is as follows. A cancer cell-line is exposed to increasing concentrations of drug A alone, drug B alone and also various concentrations of the combination for a fixed amount of time. That is we determine what effect drug A and B have as monotherapies which subsequently helps us to understand what the combination effect is. There are many articles which describe how mathematical/computational models can be used to analyse such data and possibly predict the combination effect using information on monotherapy agents alone. Those models can be either based on mechanism at the pathway or phenotype level (see CellCycler for an example of the latter) or they could be machine learning approaches. We shall call combinations at this scale cellular as they are mainly focussed on analysing combination effects at that scale. What other scales are their?

We know that human cancers contain more than one type of cell-population so the next scale from the cellular level is the tissue level.  At this level we may have populations of cells with distinct genetic backgrounds either within one tumour or across multiple tumours within one patient. Here we may find for example that drug A kills cell type X and drug B doesn’t, but drug B kills cell-type Y and drug A doesn’t. So the combination can be viewed as a cell population enrichment strategy as it is still effective even though the two drugs do not interact in any way.

Traditional drug combination screening, as described above, are not designed to explore these types of combinations. There is another scale which is probably even less well known, the human population scale …

A typical human clinical combination trial in Oncology can involve combining new drug B with existing treatment A and comparing that to A only. It is unlikely that a 3rd arm in this trial looking at drug B alone is likely to occur.  The reason for this is that if an existing treatment is known to have an effect then it’s unethical to not use it. Unless one knows what effect the new drug B has on its own, it is difficult to assess what the effect of the combination is. Indeed the combination may simply enrich the patient population. That is, if drug A shrinks tumours in patient population X and drug B doesn’t, but drug B shrinks tumours in patient population Y and drug A doesn’t, then if the trial contains both X and Y there is still a  combination effect which is greater than drug A alone.

Many people reading this blog are probably aware that when we see positive combination affects in the clinic that it could be due to this type of patient enrichment. At a meeting in Boston in April of this year a presentation from Adam Palmer suggests that two thirds of marketed combinations in Oncology can be explained in this way, see second half (slide 27 onwards) of this presentation here. This includes current immunotherapy combinations.

We can now see why combinations in Oncology can be viewed as hierarchical. How appreciative the research community is of this is unknown.  Indeed one of the latest challenges from CRUK (Cancer Research UK), see here, suggests that even they may not be fully aware of it. That challenge merely focusses on the well-trodden path of the first level described here. Which level is the best to target? Is it easier to target the tissue and human population level than the cellular one? Only time will tell.

Misapplication of statistical tests to simulated data: Mathematical Oncologists join Cardiac Modellers

In a previous blog post we highlighted the pitfalls of applying null hypothesis testing to simulated data, see here.  We showed that modellers applying null hypothesis testing to simulated data can control the p-value because they can control the sample size. Thus it’s not a great idea to analyse simulations using null hypothesis tests, instead modellers should focus on the size of the effect.  This problem has been highlighted before by White et al.  which is well worth a read, see here.

Why are we blogging about this subject again? Since that last post, co-authors of the original article we discussed there have repeated the same misdemeanour (Liberos et al., 2016), and a group of mathematical oncologists based at Moffitt Cancer Center has joined them (Kim et al., 2016).

The article by Kim et al., preprint available here, describes a combined experimental and modelling approach that “predicts” new dosing schedules for combination therapies that can delay onset of resistance and thus increase patient survival.  They also show how their approach can be used to identify key stratification factors that can determine which patients are likely to do better than others. All of the results in the paper are based on applying statistical tests to simulated data.

The first part of the approach taken by Kim et al. involves calibrating a mathematical model to certain in-vitro experiments.  These experiments basically measure the number of cells over a fixed observation time under 4 different conditions: control (no drug), AKT inhibitor, Chemotherapy and Combination (AKT/Chemotherapy).  This was done for two different cell lines. The authors found a range of parameter values when trying to fit their model to the data. From this range they took forward a particular set, no real justification as to why that certain set, to test the model’s ability to predict different in-vitro dosing schedules. Unsurprisingly the model predictions came true.

After “validating” their model against a set of in-vitro experiments the authors proceed to using the model to analyse retrospective clinical data; a study involving 24 patients.  The authors acknowledge that the in-vitro system is clearly not the same as a human system.  So to account for this difference they perform an optimisation method to generate a humanised model.  The optimisation is based on a genetic algorithm which searched the parameter space to find parameter sets that replicate the clinical results observed.  Again, similar to the in-vitro situation, they found that there were multiple parameter sets that were able to replicate the observed clinical results. In fact they found a total of 3391 parameter sets.

Having now generated a distribution of parameters that describe patients within the clinical study they are interested in, the authors next set about generating stratification factors. For each parameter set the virtual patient exhibits one of four possible response categories. Therefore for each category a distribution of parameter values exists for the entire population. To assess the difference in the distribution of parameter values across the categories they perform a students t-test to ascertain whether the differences are statistically significant. Since they can control the sample size the authors can control the standard error and p-value, this is exactly the issue raised by White et al. An alternative approach would be to state the difference in the size of the effect, so the difference in means of the distributions. If the claim is that a given parameter can discriminate between two types of responses then a ROC AUC (Receiver Operating Characteristic Area Under Curve) value could be reported. Indeed a ROC AUC value would allow readers to ascertain the strength of a given parameter in discriminating between two response types.

The application of hypothesis testing to simulated data continues throughout the rest of the paper, culminating in applying a log-rank test to simulated survival data, where again they control the sample size. Furthermore, the authors choose an arbitrary cancer cell number which dictates when a patient dies. Therefore they have two ways of controlling the p-value.  In this final act the authors again abuse the use of null hypothesis testing to show that the schedule found by their modelling approach is better than that used in the actual clinical study.  Since the major results in the paper have all involved this type of manipulation, we believe they should be treated with extreme caution until better verified.


Liberos, A., Bueno-Orovio, A., Rodrigo, M., Ravens, U., Hernandez-Romero, I., Fernandez-Aviles, F., Guillem, M.S., Rodriguez, B., Climent, A.M., 2016. Balance between sodium and calcium currents underlying chronic atrial fibrillation termination: An in silico intersubject variability study. Heart Rhythm 0. doi:10.1016/j.hrthm.2016.08.028

White, J.W., Rassweiler, A., Samhouri, J.F., Stier, A.C., White, C., 2014. Ecologists should not use statistical significance tests to interpret simulation model results. Oikos 123, 385–388. doi:10.1111/j.1600-0706.2013.01073.x

Kim, E., Rebecca, V.W., Smalley, K.S.M., Anderson, A.R.A., 2016. Phase i trials in melanoma: A framework to translate preclinical findings to the clinic. Eur. J. Cancer 67, 213–222. doi:10.1016/j.ejca.2016.07.024


The changing skyline

Back in the early 2000s, I worked a couple of years as a senior scientist at the Institute for Systems Biology in Seattle. So it was nice to revisit the area for the recent Seventh American Conference on Pharmacometrics (ACoP7).

A lot has changed in Seattle in the last 15 years. The area around South Lake Union, near where I lived, has been turned into a major hub for biotechnology and the life sciences. Amazon is constructing a new campus featuring giant ‘biospheres’ which look like nothing I have ever seen.

Attending the conference, though, was like a blast from the past – because unlike the models used by architects to design their space-age buildings, the models used in pharmacology have barely moved on.

While there were many interesting and informative presentations and posters, most of these involved relatively simple models based on ordinary differential equations, very similar to the ones we were developing at the ISB years ago. The emphasis at the conference was on using models to graphically present relationships, such as the interaction between drugs when used in combination, and compute optimal doses. There was very little about more modern techniques such as machine learning or data analysis.

There was also little interest in producing models that are truly predictive. Many models were said to be predictive, but this just meant that they could reproduce some kind of known behaviour once the parameters were tweaked. A session on model complexity did not discuss the fact, for example, that complex models are often less predictive than simple models (a recurrent theme in this blog, see for example Complexity v Simplicity, the winner is?). Problems such as overfitting were also not discussed. The focus seemed to be on models that are descriptive of a system, rather than on forecasting techniques.

The reason for this appears to come down to institutional effects. For example, models that look familiar are more acceptable. Also, not everyone has the skills or incentives to question claims of predictability or accuracy, and there is a general acceptance that complex models are the way forward. This was shown by a presentation from an FDA regulator, which concentrated on models being seen as gold-standard rather than accurate (see our post on model misuse in cardiac models).

Pharmacometrics is clearly a very conservative area. However this conservatism means only that change is delayed, not that it won’t happen; and when it does happen it will probably be quick. The area of personalized medicine, for example, will only work if models can actually make reliable predictions.

As with Seattle, the skyline may change dramatically in a very short time.

Complexity v Simplicity, the winner is?

I recently published a letter with the above title in the journal of Clinical Pharmacology and Therapeutics; unfortunately it’s behind a paywall so I will briefly take you through the key point raised. The letter describes a specific prediction problem around drug induced cardiac toxicity mentioned in a previous blog entry (Mathematical models for ion-channel cardiac toxicity: David v Goliath). In short what we show in the letter is that a simple model using subtraction and addition (pre-school Mathematics) performs just as well for a given prediction problem as a multi-model approach using three large-scale models consisting of 100s of differential equations combined with machine learning approach (University level Mathematics and Computation)! The addition/subtraction model gave a ROC AUC of 0.97 which is very similar to multi-model/machine learning approach which gave a ROC AUC of 0.96. More detail on the analysis can be found on slides 17 and 18 within this presentation, A simple model for ion-channel related cardiac toxicity, which was given at an NC3Rs meeting.

The result described in the letter and presentation continues to add weight within that field that simple models are performing just as well as complex approaches for a given prediction task.

When is a result significant?

The standard way of answering this question is to ask whether the effect could reasonably have happened by chance (the null hypothesis). If not, then the result is announced to be ‘significant’. The usual threshold for significance is that there is only a 5 percent chance of the results happening due to purely random effects.

This sounds sensible, and has the advantage of being easy to compute. Which is perhaps why statistical significance has been adopted as the default test in most fields of science. However, there is something a little confusing about the approach; because it asks whether adopting the opposite of the theory – the null hypothesis – would mean that the data is unlikely to be true. But what we want to know is whether a theory is true. And that isn’t the same thing.

As just one example, suppose we have lots of data and after extensive testing of various theories we discover one that passes the 5 percent significance test. Is it really 95 percent likely to be true? Not necessarily – because if we are trying out lots of ideas, then it is likely that we will find one that matches purely by chance.

While there are ways of working around this within the framework of standard statistics, the problem usually gets glossed over in the vast majority of textbooks and articles. So for example it is typical to say that a result is ‘significant’ without any discussion of whether it is plausible in a more general sense (see our post on model misuse in cardiac modeling).

The effect is magnified by publication bias – try out multiple theories, find one that works, and publish. Which might explain why, according to a number of studies (see for example here and here), much scientific work proves impossible to replicate – a situation which scientist Robert Matthews calls a ‘scandal of stunning proportions’ (see his book Chancing It: The laws of chance – and what they mean for you).

The way of Bayes

An alternative approach is provided by Bayesian statistics. Instead of starting with the assumption that data is random and making weird significance tests on null hypotheses, it just tries to estimate the probability that a model is right (i.e. the thing we want to know) given the complete context. But it is harder to calculate for two reasons.

One is that, because it treats new data as updating our confidence in a theory, it also requires we have some prior estimate of that confidence, which of course may be hard to quantify – though the problem goes away as more data becomes available. (To see how the prior can affect the results, see the BayesianOpionionator web app.) Another problem is that the approach does not treat the theory as fixed, which means that we may have to evaluate probabilities over whole families of theories, or at least a range of parameter values. However this is less of an issue today since the simulations can be performed automatically using fast computers and specialised software.

Perhaps the biggest impediment, though, is that when results are passed through the Bayesian filter, they often just don’t seem all that significant. But while that may be bad for publications, and media stories, it is surely good for science.

The exponential growth effect

A common critique of biologists, and scientists in general, concerns their occasionally overenthusiastic tendency to find patterns in nature – especially when the pattern is a straight line. It is certainly notable how, confronted with a cloud of noisy data, scientists often manage to draw a straight line through it and announce that the result is “statistically significant”.

Straight lines have many pleasing properties, both in architecture and in science. If a time series follows a straight line, for example, it is pretty easy to forecast how it should evolve in the near future – just assume that the line continues (note: doesn’t always work).

However this fondness for straightness doesn’t always hold; indeed there are cases where scientists prefer to opt for a more complicated solution. An example is the modelling of tumour growth in cancer biology.

Tumour growth is caused by the proliferation of dividing cells. For example if cells have a cell cycle length td, then the total number of cells will double every td hours, which according to theory should result in exponential growth. In the 1950s (see Collins et al., 1956) it was therefore decided that the growth rate could be measured using the cell doubling time.

In practice, however, it is found that tumours grow more slowly as time goes on, so this exponential curve needed to be modified. One variant is the Gompertz curve, which was originally derived as a model for human lifespans by the British actuary Benjamin Gompertz in 1825, but was adapted for modelling tumour growth in the 1960s (Laird, 1964). This curve gives a tapered growth rate, at the expense of extra parameters, and has remained highly popular as a means of modelling a variety of tumour types.

However, it has often been observed empirically that tumour diameters, as opposed to volumes, appear to grow in a roughly linear fashion. Indeed, this has been known since at least the 1930s. As Mayneord wrote in 1932: “The rather surprising fact emerges that the increase in long diameter of the implanted tumour follows a linear law.” Furthermore, he noted, there was “a simple explanation of the approximate linearity in terms of the structure of the sarcoma. On cutting open the tumour it is often apparent that not the whole of the mass is in a state of active growth, but only a thin capsule (sometimes not more than 1 cm thick) enclosing the necrotic centre of the tumour.”

Because only this outer layer contains dividing cells, the rate of increase for the volume depends on the doubling time multiplied by the volume of the outer layer. If the thickness of the growing layer is small compared to the total tumour radius, then it is easily seen that the radius grows at a constant rate which is equal to the doubling time multiplied by the thickness of the growing layer. The result is a linear growth in radius. This  translates to cubic growth in volume, which of course grows more slowly than an exponential curve at longer times – just as the data suggests.

In other words, rather than use a modified exponential curve to fit volume growth, it may be better to use a linear equation to model diameter. This idea that tumour growth is driven by an outer layer of proliferating cells, surrounding a quiescent or necrotic core, has been featured in a number of mathematical models (see e.g. Checkley et al., 2015, and our own CellCycler model).  The linear growth law can also be used to analyse tumour data, as in the draft paper: “Analysing within and between patient patient tumour heterogenity via imaging: Vemurafenib, Dabrafenib and Trametinib.” The linear growth equation will of course not be a perfect fit for the growth of all tumours (no simple model is), but it is based on a consistent and empirically verified model of tumour growth, and can be easily parameterised and fit to data.

So why hasn’t this linear growth law caught on more widely? The reason is that what scientists see in data often depends on their mental model of what is going on.

I first encountered this phenomenon in the late 1990s when doing my D.Phil. in the prediction of nonlinear systems, with applications to weather forecasting. The dominant theory at the time said that forecast error was due to sensitivity to initial condition, aka the butterfly effect. As I described in The Future of Everything, researchers insisted that forecast errors showed the exponential growth characteristic of chaos, even though plots showed they clearly grew with slightly negative curvature, which was characteristic of model error.

A similar effect in cancer biology has again changed the way scientists interpret data. Sometimes, a straight line really is the best solution.


Collins, V. P., Loeffler, R. K. & Tivey, H. Observations on growth rates of human tumors. The American journal of roentgenology, radium therapy, and nuclear medicine 76, 988-1000 (1956).

Laird A. K. Dynamics of tumor growth. Br J of Cancer 18 (3): 490–502 (1964).

W. V. Mayneord. On a Law of Growth of Jensen’s Rat Sarcoma. Am J Cancer 16, 841-846 (1932).

Stephen Checkley, Linda MacCallum, James Yates, Paul Jasper, Haobin Luo, John Tolsma, Claus Bendtsen. Bridging the gap between in vitro and in vivo: Dose and schedule predictions for the ATR inhibitor AZD6738. Scientific Reports, 5(3)13545 (2015).

Yorke, E. D., Fuks, Z., Norton, L., Whitmore, W. & Ling, C. C. Modeling the Development of Metastases from Primary and Locally Recurrent Tumors: Comparison with a Clinical Data Base for Prostatic Cancer. Cancer Research 53, 2987-2993 (1993).

Hitesh Mistry, David Orrell, and Raluca Eftimie. Analysing within and between patient patient tumour heterogenity via imaging: Vemurafenib, Dabrafenib and Trametinib. (Working paper)

The CellCycler

Tumour modelling has been an active field of research for some decades, and a number of approaches have been taken, ranging from simple models of an idealised spherical tumour, to highly complex models which attempt to account for everything from cellular chemistry to mechanical stresses. Some models use ordinary differential equations, while others use an agent-based approach to track individual cells.

A disadvantage of the more complex models is that they involve a large number of parameters, which can only be roughly estimated from available data. If the aim is to predict, rather than to describe, then this leads to the problem of overfitting: the model is very flexible and can be tuned to fit available data, but is less useful for predicting for example the effect of a new drug.

Indeed, there is a rarely acknowledged tension in mathematical modelling between realism, in the sense of including lots of apparently relevant features, and predictive accuracy. When it comes to the latter, simple models often out-perform complex models. Yet in most areas there is a strong tendency for researchers to develop increasingly intricate models. The reason appears to have less to do with science, than with institutional effects. As one survey of business models notes (and these points would apply equally to cancer modelling) complex models are preferred in large part because: “(1) researchers are rewarded for publishing in highly ranked journals, which favor complexity; (2) forecasters can use complex methods to provide forecasts that support decision-makers’ plans; and (3) forecasters’ clients may be reassured by incomprehensibility.”

Being immune to all such pressures (this is just a blog post after all!) we decided to develop the CellCycler – a parsimonius “toy” model of a cancer tumour that attempts to capture the basic growth and drug-response dynamics using only a minimal number of parameters and assumptions.  The model uses circa 100 ordinary differential equations (ODEs) to simulate cells as they pass through the phases of the cell cycle; however the equations are simple and the model only uses parameters that can be observed or reasonably well approximated. It is available online as a Shiny app.

Screenshot of the Cells page of the CellCycler. The plot shows how a cell population is affected by two different drugs.

The CellCycler model divides the cell cycle into a number of discrete compartments, and is therefore similar in spirit to other models that for example treat each phase G1, S, G2, and mitosis as a separate compartment, with damaged cells being shunted to their own compartment (see for example the model by Checkley et al. here). Each compartment has its own set of ordinary differential equations which govern how its volume changes with time due to growth, apoptosis, or damage from drugs. There are additional compartments for damaged cells, which may be repaired or lost to apoptosis. Drugs are simulated using standard PK models, along with a simple description of phase-dependent drug action on cells. For the tumour growth, we use a linear model, based like the Checkley et al. paper on the assumption of a thin growing layer (see also our post on The exponential growth effect).

The advantages of compartmentalising

Dividing the cell cycle into separate compartments has an interesting and useful side effect, which is that it introduces a degree of uncertainty into the calculation. For example, if a drug causes damage and delays progress in a particular phase, then that drug will tend to synchronize the cell population in that state. However there is an obvious difference between cells that are affected when they are at the start of the phase, and those that are already near the end of the phase. If the compartments are too large, that precise information about the state of cells is lost.

The only way to restore precision would be to use a very large number of compartments. But in reality, individual cells will not all have exactly the same doubling time. We therefore want to have a degree of uncertainty. And this can be controlled by adjusting the number of compartments.

This effect is illustrated by the figure below, which shows how a perturbation at time zero in one compartment tends to blur out over time, for models with 25, 50, and 100 compartments, and a doubling time of 24 hours. In each case a perturbation is made to compartment 1 at the beginning of the cell cycle (the magnitude is scaled to the number of compartments so the total size of the perturbation is the same in terms of total volume). For the case with 50 compartments, the curve after one 24 hours is closely approximated by a normal distribution with standard deviation of 3.4 hours or about 14 percent. In general, the standard deviation can be shown to be approximately equal to the doubling time divided by the square root of N.

The solid lines show volume in compartment 1 following a perturbation to that compartment alone, after one cell doubling period of 24 hours. The cases shown are with N=25, 50, and 100 compartments. Dashed lines are the corresponding normal distributions.

A unique feature of the CellCycler is that it exploits this property as a way of adjusting the variability of doubling time in the cell population. The model can therefore provide a first-order approximation to the more complex heterogeneity that can be simulated using agent-based models. While we don’t usually have exact data on the spread of doubling times in the growing layer, the default level of 50 compartments gives what appears to be a reasonable degree of spread (about 14 percent). Using 25 compartments gives 20 percent, while using 100 compartments decreases this to 10 percent.

Using the CellCycler

The starting point for the Shiny web application is the Cells page, which is used to model the dynamics of a growing cell population. The key parameters are the average cell doubling time, and the fraction spent in each phase. The number of model compartments can be adjusted in the Advanced page: note that, along with doubling time spread, the choice also affects both the simulation time (more compartments is slower), and the discretisation of the cell cycle. For example with 50 compartments the proportional phase times will be rounded off to the nearest 1/50=0.02.

The next pages, PK1 and PK2, are used to parameterise the PK models and drug effects. The program has a choice of standard PK models, with adjustable parameters such as Dose/Volume.  In addition the phase of action (choices are G1, S, G2, M, or all), and rates for death, damage, and repair can be adjusted. Finally, the Tumor page (shown below) uses the model simulation to generate a plot of tumor radius, given an initial radius and growing layer. Plots can be overlaid with experimental data.

Screenshot of the Tumor page, showing tumor volume (black line) compared to control (grey). Cell death due to apoptosis by either drug (red and blue) and damage (green) are also shown.

We hope the CellCycler can be a useful tool for research or for exploring the dynamics of tumour growth. As mentioned above it is only a “toy” model of a tumour. However, all our models of complex organic systems – be they of a tumor, the economy, or the global climate system – are toys compared to the real things. And of course there is nothing to stop users from extending the model to incorporate additional effects. Though whether this will lead to improved predictive accuracy is another question.

Try the CellCycler web app here.


Stephen Checkley, Linda MacCallum, James Yates, Paul Jasper, Haobin Luo, John Tolsma, Claus Bendtsen. “Bridging the gap between in vitro and in vivo: Dose and schedule predictions for the ATR inhibitor AZD6738,” Scientific Reports.2015;5(3)13545.

Green, Kesten C. & Armstrong, J. Scott, 2015. “Simple versus complex forecasting: The evidence,” Journal of Business Research, Elsevier, vol. 68(8), pages 1678-1685.

Mathematical models for ion-channel cardiac toxicity: David v Goliath

This blog entry will focus on a rather long standing debate around model complexity and predictivity for a specific prediction problem from drug development. A typical drug project starts off with 1000’s of drugs for a certain idea. All but one of these drugs is eventually weened out through a series of experiments, which explore safety and efficacy, with the final drug being the one that enters human trials.  The question we will explore is around a toxicity experiment performed rather early in the development (weening out) process, which determines the drug’s effect on the cardiac system.

Many years of research has identified certain proteins, ion-channels, which if a drug were to affect could lead to dire consequences for a patient.  In simple terms, ion-channels allow ions, such as calcium, to flow in and out of a cell. Drugs can bind to ion-channels and disrupt their ability to function, thus affecting the flow of ions. The early experiment we are interested in basically measures how many ions flow across an ion-channel with increasing amount of drug.  The cells used in these experiments are engineered to over-express the human protein we are interested in and so do not reflect a real cardiac cell. The experiment is pretty much automated and so allows one to screen 1000s of drugs a year against certain ion-channels.  The output of the system is an IC50 value, the amount of drug needed to reduce the flow of ions across the ion-channel by 50 percent.

A series of IC50 values are generated for each drug against a number of ion-channels. (We are actually only interested in three.) The reason why a large screening effort is made is because we cannot test all the compounds in an animal model nor can we take all of them into man! So we can’t measure the effect of these drugs in real cardiac systems but we can measure their effect on certain ion-channel proteins which are expressed in the cardiac system we are interested in.  The question is then: given a set of IC50 values against certain ion-channels for a particular drug can we predict how this drug will affect a cardiac system?

As mentioned earlier, drug development involves performing a series of experiments over time. The screening experiment described above is one of many used to look at cardiac toxicity. The next experiment in the pipeline, which could occur one or maybe two years later, is exploring the remaining drugs in an intact cardiac system.  This could be a single cardiac cell taken from a dog, a portion of the ventricular wall, or something else entirely. After which, even less compounds are taken into dog studies before entering human trials. So the prediction question could be related to any one of these cardiac systems.  The inputs into the prediction problem are the set of IC50 values, three in the cases we will look at, whereas the output, which we want to predict, are certain measures from the cardiac systems described.

At this point some of you may be thinking, well if we want to predict what will happen in a real cardiac system then why don’t we build a virtual version of the system using a large mathematical model (biophysical model)? Indeed people have done this. However, others (especially those who follow this blog) might also be thinking, I have three inputs and one output and given we screen lots of these compounds surely the dynamics are not that difficult to figure out, such that I can do something simpler and more cost effective! Again people have done this too. If I were to refer to the virtual system (consists of >100 parameters) as Goliath and the simple model (3 parameters) as David some of you can guess what the outcome is! A paper documenting the story in detail can be found here and the model used is available online here.  I will just give a brief summary of the findings in the main paper.

The data-sets explored in the article involve making predictions in both animal studies and human.  Something noticeable about the biophysical models used in the original articles was that a different structural model was needed for each study.  This was not the case for the simple model which uses the same structure across all data sets.  Given that the simple model gave the same if not better performance than the biophysical models it raises a question: why do the biophysical modelling community need a different model for different studies? In fact for two human studies, A and B, different human models were used, why?  The reason may be that the degree of confidence in those models by people using them is actually quite low, hence the lack of consistency in the models used across the studies. Another issue not discussed by any of the biophysical modeling literature is the reproducibility of the data used to build such models. Given the growing skepticism of the reproducibility of preclinical data in science this adds further doubt to the suitability of such models for industrial use.

Given the points raised here (as well as a previous blog entry highlighting the misuse of these models by their own developers) can the biophysical modelling community be trusted to deliver a modelling solution that is both trustworthy and reliable? This is an important question as regulatory agencies are now also considering using these biophysical models together with some quite exciting new experimental techniques to change the way people assess the cardiac liability of a new drug.

Model Misuse: Applying hypothesis testing to simulated data from in-silico cardiac models

In the previous blog there was an interesting link to a report by Yaron Hollander on the use and abuse of models in transport forecasting.  His description of abuse of models can be seen in many sectors including the life sciences where it is arguably a bigger issue. Why? Other sectors have to some degree acknowledged the concept of structural uncertainty, which is a taboo subject for, most not all, modelers within the life sciences sector.  By acknowledging there is a problem modelers within the other sectors have at least moved beyond the denial phase, the first phase of an addiction problem.  This does not seem to be the case for most life sciences modelers.  A typical example of this can be seen in a recent article by Zhou et al. from the University of Oxford which explores the mechanisms, through use of modelling and simulation, behind certain biological phenomena in cardiac myocytes termed alternans (alternating long and short action potentials)…

In the article, Zhou et al., claim that the mathematical/computational model being used within the study is the “gold standard” and has been “extensively validated”.  Declaring a model as being the gold standard and extensively validated gives a licence to models being used to answer many questions which the model has not been tested for which will lead to all sorts of misuse of a model. Indeed the type of model used by Zhou et al. can never truly be tested due to its scale: 10’s of variables and 100’s of parameters.  Such large models, which also include extensive non-linear functions, are almost impossible to test because they are so flexible. Thus, using such models for the type of analysis Zhou et al. conducted can be considered a classic example of model misuse. The authors applied the following analysis (more detail can be found in the article):

  • A population of models is created by generating 10000 parameter sets by perturbing a subset of model parameters
  • Of these a subset (~2500) are deemed acceptable according to some criteria
  • Each of these parameter sets are then used to explore the alternan phenomena
  • Parameter sets are then grouped by how they answer the following questions:
    1. Does a parameter set produce alternans or not
    2. Are the alternans eye or folk type
  • Finally statistical tests are performed to ascertain whether the distributions of parameters are different between the groups created.

In essence they are applying statistical tests to simulated data, which has been discussed within ecology as something that should not be done.  White et al. provide two reasons why statistical significance tests should not be used to interpret simulation results of which the first is most relevant here as the second is more a philosophical debate to some degree.  The first reason revolves around power calculations: probability that a test correctly rejects the null hypothesis when the alternative is true. One of the key components of a power calculation is sample size! In brief, by using such a large sample size, numbers of simulations, Zhou et al. have powered their study to be able to detect the smallest of differences between groups.  Indeed Zhou et al. can control the sample size and thus control the results of a statistical test; they could be accused of p-hacking. This brings into question the results seen by Zhou et al. In addition to the misuse of statistical hypothesis testing there is another more worrying issue about the first step of the approach: using large flexible models to explain variability in a dependent variable, measured experimentally, by varying a subset of model parameters.  An obvious question is which parameters should be varied in such large models given how flexible they are? Furthermore, the bigger issue around structural uncertainty still hasn’t been addressed with such an approach.  What consequences could these issues have? It will lead to a high number of false positives and waste experimental resources chasing hypotheses that were not worthwhile.

Finally on an even more cautionary note, if the type of approach, described by Zhou et al., were used to develop biomarkers and to guide clinical trials then this is likely to increase clinical trial failure rates rather than improve them. In an era where people within the healthcare industry are looking at systems approaches, real care must be taken as to what approaches are actually used within the industry. As modelers our duty is to remain questioning and skeptical.