Cover article (in Japanese) by David Orrell in Newsweek Japan on why economists can’t predict the future. Read an extract here.
Cover article (in Japanese) by David Orrell in Newsweek Japan on why economists can’t predict the future. Read an extract here.
Registration is open for a one-day data analysis workshop which will be a mix of talks and discussions. The talks will go through analysis methods and examples that will show how you can get more from your own data as well as how to leverage external data (open genomic and clinical data). Participants can also bring an old poster and discuss data analysis options with speakers for some on-the-day consulting. To register and find out more about the day follow this link.
The title of this blog-post refers to a meeting which I attended recently which was sponsored by the UK QSP network. Some of you may not be familiar with the term QSP, put it simply it describes the application of mathematical and computational tools to pharmacology. As the title of the blog-post suggests the meeting was on the subject of model reduction.
The meeting was split into 4 sessions entitled:
I was asked to present a talk in the first session, see here for the slide-set. The talk was based on a topic that has been mentioned on this blog a few times before, ion-channel cardiac toxicity prediction. The latest talk goes through how 3 models were assessed in their ability to discriminate between non-cardio-toxic and cardio-toxic drugs across 3 data-sets which are currently available. (A report providing more details is currently being put together and will be released shortly.) The 3 models used were a linear combination of block (simple model – blog-post here) and 2 large scale biophysical models of a cardiac cell, one termed the “gold-standard” (endorsed by FDA and other regulatory agencies/CiPA – )and the other forming a key component of the “cardiac safety simulator” .
The results showed that the simple model does just as well and in certain data-sets out-performs the leading biophysical models of the heart, see slide 25. Towards the end of the talk I discussed what drivers exist for producing such large models and should we invest further in them given the current evidence, see slides 24, 28 and 33. How does this example fit into all the sessions of the meeting?
In answer to the first session title: How to test if your model is too big? The answer is straightforward: if the simpler/smaller model outperforms the larger model then the larger model is too big. Regarding the second session on model reduction techniques – in this case there is no need for these. You could argue, from the results discussed here, that instead of pursuing model reduction techniques we may want to consider building smaller/simpler models to begin with. Onto the 3rd session, on the benefits of building a non-identifiable model, it’s clear that there was no benefit in this situation in developing a non-identifiable model. Finally regarding techniques for over-parameterised models – the lesson learned from the cardiac toxicity field is just don’t build these sorts of models for this question.
Some people at the meeting argued that the type of model depends on the question, this is true, but does the scale of the model depend on the question?
If we now go back to the title of the meeting, is there a case for model reduction? Within the field of ion-channel cardiac toxicity the response would be: Why bother reducing a large model when you can build a smaller and simpler model which shows equal/better performance?
Of course, within the (skeptical) framework of Green and Armstrong  point out (see slide 28), one reason for model reduction is that for researchers it is the best of both worlds: they can build a highly complex model, suitable for publication in a highly-cited journal, and then spend more time extracting a simpler version to support client plans. Maybe those journals should look more closely at their selection criteria.
 Colatsky et al. The Comprehensive in Vitro Proarrhythmia Assay (CiPA) initiative — Update on progress. Journal of Pharmacological and Toxicological Methods. (2016) Volume 81 pages 15-20.
 Glinka A. and Polak S. QTc modification after risperidone administration – insight into the mechanism of action with use of the modeling and simulation at the population level approach. Toxicol. Mech. Methods (2015) 25 (4) pages 279-286.
 Green K. C. and Armstrong J. S. Simple versus complex forecasting: The evidence. Journal Business Research. (2015) 68 (8) pages 1678-1685.
There is growing interest, within the pharmaceutical industry, in using a patients’ own cancer material to screen the effect of numerous treatments within an animal model. The reason for shifting from standard xenografts, based on historical immortalised cell-lines, is that those models are considered to be very different to the patients’ tumours of today. Thus PDX models are considered to be a more relevant model as they are “closer” to the patient population in which you are about to test your new treatment.
Typically only a handful of PDX models are used, but recently there has been a shift in focus to perform population PDX studies which mimic small scale clinical trials. One of these studies by Gao et al. also published, as an excel file in the supplementary information, the raw data which included not only the treatment effect growth curves, but also genomic data consisting of DNA copy number, gene expression and mutation. Using this data it is possible to explore correlations between treatment response and genomic features.
We at Systems Forecasting are always appreciative of freely available data sets, and have designed an equally free and available PDXdata app to browse through this data.
The app can be used to read excel files in the same form as the Novartis file “nm.3954-S2.xlsx”. It translates volume measurements to diameter, and computes a linear fit to each tumour growth time series. The user can then plot time series, organised by ID or by treatment, or examine statistics for the entire data set. The aim is to explore how well linear models can be used to fit this type of data.
The “Diameters” page shown in the figure below is used to plot time series for data. First read in the excel data file; this may take a while, so a progress bar is included. Data can be grouped either by ID(s) or by treatment(s). Note the Novartis data has several treatments for the same ID, and the data is filtered to include only those IDs with an untreated case. If there is only one treatment per ID, one can group by treatment and then plot the data for the IDs with that treatment. In this case the untreated fit is computed from the untreated IDs.
As shown in the next figure, the “Copy Number” and “RNA” tabs allow the user to plot the correlations between copy number or RNA and treatment efficacy, as measured by change in the slope of linear growth, for individual treatments (provided data is available for the selected treatment).
Finally, the “Statistics” page plots a histogram of data derived from the linear models. These include intercept, slope, and sigma for the linear fit to each time series; the difference in slope between the treated and untreated cases (delslope); the growth from initial to final time of the linear fit to the untreated case (lingr); and the difference delgr=diamgr-lingr, which measures diameter loss due to drug.
This app is very much a work-in-progress, and at the moment is primarily a way to browse, view and plot the data. We will add more functionality as it becomes available.
The traditional preclinical combination experiment in Oncology for two drugs A and B is as follows. A cancer cell-line is exposed to increasing concentrations of drug A alone, drug B alone and also various concentrations of the combination for a fixed amount of time. That is we determine what effect drug A and B have as monotherapies which subsequently helps us to understand what the combination effect is. There are many articles which describe how mathematical/computational models can be used to analyse such data and possibly predict the combination effect using information on monotherapy agents alone. Those models can be either based on mechanism at the pathway or phenotype level (see CellCycler for an example of the latter) or they could be machine learning approaches. We shall call combinations at this scale cellular as they are mainly focussed on analysing combination effects at that scale. What other scales are their?
We know that human cancers contain more than one type of cell-population so the next scale from the cellular level is the tissue level. At this level we may have populations of cells with distinct genetic backgrounds either within one tumour or across multiple tumours within one patient. Here we may find for example that drug A kills cell type X and drug B doesn’t, but drug B kills cell-type Y and drug A doesn’t. So the combination can be viewed as a cell population enrichment strategy as it is still effective even though the two drugs do not interact in any way.
Traditional drug combination screening, as described above, are not designed to explore these types of combinations. There is another scale which is probably even less well known, the human population scale …
A typical human clinical combination trial in Oncology can involve combining new drug B with existing treatment A and comparing that to A only. It is unlikely that a 3rd arm in this trial looking at drug B alone is likely to occur. The reason for this is that if an existing treatment is known to have an effect then it’s unethical to not use it. Unless one knows what effect the new drug B has on its own, it is difficult to assess what the effect of the combination is. Indeed the combination may simply enrich the patient population. That is, if drug A shrinks tumours in patient population X and drug B doesn’t, but drug B shrinks tumours in patient population Y and drug A doesn’t, then if the trial contains both X and Y there is still a combination effect which is greater than drug A alone.
Many people reading this blog are probably aware that when we see positive combination affects in the clinic that it could be due to this type of patient enrichment. At a meeting in Boston in April of this year a presentation from Adam Palmer suggests that two thirds of marketed combinations in Oncology can be explained in this way, see second half (slide 27 onwards) of this presentation here. This includes current immunotherapy combinations.
We can now see why combinations in Oncology can be viewed as hierarchical. How appreciative the research community is of this is unknown. Indeed one of the latest challenges from CRUK (Cancer Research UK), see here, suggests that even they may not be fully aware of it. That challenge merely focusses on the well-trodden path of the first level described here. Which level is the best to target? Is it easier to target the tissue and human population level than the cellular one? Only time will tell.
At a recent meeting at a medical health faculty, researchers were asked to nominate their favourite papers. One person instead of nominating a paper nominated a whole project website, The Reproducibility Project in Cancer Biology, see here. This person was someone who had left the field of systems biology to re-train as a biostatistician. In case you might be wondering it wasn’t me! In this blog-post we will take a look at the project, the motivation behind it and some of the emerging results.
The original paper which sets out the aims of the project can be found here. The initiative was a joint collaboration between the Center of Open Science and Science Exchange. The motivation behind it is likely to be quite obvious to many readers, but for those who are unfamiliar it relates to the fact that there are many incentives given to exciting new results, much less for verifying old discoveries.
The main paper goes into some detail about the reasons why it is difficult to reproduce results. One of the key factors is openness, which is why this is the first reproducibility attempt that has extensive documentation. The project’s main reason for choosing cancer research was due to previous findings published by Bayer and Amgen, see here and here. In those previous reports the exact details regarding which replication studies were attempted were not published, hence the need for an open project.
The first part of a reproducibility project is to decide which articles to pick. The obvious choices are the ones that are cited the most and have had the most publicity. Indeed this is what the project did. They chose 50 of the most impactful articles in cancer biology published between 2010 and 2012. The experimental group used to conduct the replication studies was not actually a single group. The project utilised the Science Exchange, see here, which is a network that consists of over 900 contract research organisations (CROs). Thus they did not have to worry about finding the people with the right skills.
One clear advantage of using a CRO over an academic lab is that there is no reason for them to be biased either for or against a particular experiment, which may not be true of academic labs. The other main advantage is time and cost – scale up is more efficient. All the details of the experiments and power calculations of the original studies were placed on the Open Science Framework, see here. So how successful has the project been?
The first sets of results are out and as expected they are variable. If you would like to read the results in detail, go to this link here. The five projects were:
Two of the studies (1) and (4) were largely successful , and one (5) was not. The other two replication studies were found to be un-interpretable as the animal cancer models showed odd behaviour: they either grew too fast or exhibited spontaneous tumour regressions!
One of the studies which was deemed un-interpretable has led to a clinical trial: development of an anti-CD47 antibody. These early results highlight that there is an issue around reproducing preclinical oncology experiments, but many already knew this. (Just to add, this is not about reproducing p-values but size and direction of effects.) The big question is how to improve the reproducibility of research; there are many opinions on this matter. Clearly one step is to reward replication studies, which is easier said than done in an environment where novel findings are the ones that lead to riches!
Explore the deadly elegance of finance’s hidden powerhouse
The Money Formula takes you inside the engine room of the global economy to explore the little-understood world of quantitative finance, and show how the future of our economy rests on the backs of this all-but-impenetrable industry. Written not from a post-crisis perspective – but from a preventative point of view – this book traces the development of financial derivatives from bonds to credit default swaps, and shows how mathematical formulas went beyond pricing to expand their use to the point where they dwarfed the real economy. You’ll learn how the deadly allure of their ice-cold beauty has misled generations of economists and investors, and how continued reliance on these formulas can either assist future economic development, or send the global economy into the financial equivalent of a cardiac arrest.
Rather than rehash tales of post-crisis fallout, this book focuses on preventing the next one. By exploring the heart of the shadow economy, you’ll be better prepared to ride the rough waves of finance into the turbulent future.
How do you create a quadrillion dollars out of nothing, blow it away and leave a hole so large that even years of “quantitative easing” can’t fill it – and then go back to doing the same thing? Even amidst global recovery, the financial system still has the potential to seize up at any moment. The Money Formula explores the how and why of financial disaster, what must happen to prevent the next one.
PRAISE FOR THE MONEY FORMULA
“This book has humor, attitude, clarity, science and common sense; it pulls no punches and takes no prisoners.”
Nassim Nicholas Taleb, Scholar and former trader
“There are lots of people who′d prefer you didn′t read this book: financial advisors, pension fund managers, regulators and more than a few politicians. That′s because it makes plain their complicity in a trillion dollar scam that nearly destroyed the global financial system. Insiders Wilmott and Orrell explain how it was done, how to stop it happening again and why those with the power to act are so reluctant to wield it.”
Robert Matthews, Author of Chancing It: The Laws of Chance and How They Can Work for You
“Few contemporary developments are more important and more terrifying than the increasing power of the financial system in the global economy. This book makes it clear that this system is operated either by people who don′t know what they are doing or who are so greed–stricken that they don′t care. Risk is at dangerous levels. Can this be fixed? It can and this book full of healthy skepticism and high expertise shows how.”
Bryan Appleyard, Author and Sunday Times writer
“In a financial world that relies more and more on models that fewer and fewer people understand, this is an essential, deeply insightful as well as entertaining read.”
Joris Luyendijk, Author of Swimming with Sharks: My Journey into the World of the Bankers
“A fresh and lively explanation of modern quantitative finance, its perils and what we might do to protect against a repeat of disasters like 2008–09. This insightful, important and original critique of the financial system is also fun to read.”
Edward O. Thorp, Author of A Man for All Markets and New York Times bestseller Beat the Dealer
In a previous blog entry, see here, we discussed how survival analysis methods could be used to determine the profitability of P2P loans. The “trick” highlighted in that previous post was to focus on the profit/loss of a loan – which in fact is what you actually care about – rather than when and if a loan defaults. In doing so we showed that even loans that default are profitable if interest rates are high enough and the period of loan short enough.
Given that basic survival analysis methods shed light on betting strategies that could be profitable, are there more aggressive approaches that exist in the healthcare community that the financial world could take advantage of? The answer to that question is yes and it lies in using crowdsourcing as we shall now discuss.
Over recent years there has been an increase in prediction competitions in the healthcare sector. One set of organisers have aptly named these competitions as DREAM challenges, follow this link to their website. Compared to other prediction competition websites such as Kaggle here, the winning algorithms are made publicly available through the website and also published.
A recurring theme of these competitions, that simply moves from one disease area to the next, is survival. The most recent of these involved predicting the survival of prostate cancer patients who were given a certain therapy, results were published here. Unfortunately the paper is behind a paywall but the algorithm is downloadable from the DREAM challenge website.
The winning algorithm was basically an ensemble of Cox proportional hazards regression models, we briefly explained what these are in our previous blog entry. Those of you reading this blog who have a technical background will be thinking that doesn’t sound like an overly complicated modelling approach. In fact it isn’t – what was sophisticated was how the winning entry partitioned the data for explorative analyses and model building. The strategy appeared to be more important than the development of a new method. This observation resonates with the last blog entry on Big data versus big theory.
So what does all this have to do with the financial sector? Well competitions like the one described above can quite easily be applied to financial problems, as we blogged about previously, where survival analyses are currently being applied for example to P2P loan profitability. So the healthcare prediction arena is in fact a great place to search for the latest approaches for financial betting strategies.
The Winter 2017 edition of Foresight magazine includes my commentary on the article Changing the Paradigm for Business Forecasting by Michael Gilliland from SAS. Both are behind a paywall (though a longer version of Michael’s argument can be read on his SAS blog), but here is a brief summary.
According to Gilliland, business forecasting is currently dominated by an “offensive” paradigm, which is “characterized by a focus on models, methods, and organizational processes that seek to extract every last fraction of accuracy from our forecasts. More is thought to be better—more data, bigger computers, more complex models—and more elaborate collaborative processes.”
He argues that our “love affair with complexity” can lead to extra effort and cost, while actually reducing forecast accuracy. And while managers have often been seduced by the idea that “big data was going to solve all our forecasting problems”, research shows that even with complex models, forecast accuracy often fails to beat even a no-change forecasting model. His article therefore advocates a paradigm shift towards “defensive” forecasting, which focuses on simplifying the forecasting process, eliminating bad practices, and adding value.
My comment on this (in about 1200 words) is … I agree. But I would argue that the problem is less big data, or even complexity, than big theory.
Our current modelling paradigm is fundamentally reductionist – the idea is to reduce a system to its parts, figure out the laws that govern their interactions, build a giant simulation of the whole thing, and solve. The resulting models are highly complex, and their flexibility makes them good at fitting past data, but they tend to be unstable (or stable in the wrong way) and are poor at making predictions.
If however we recognise that complex systems have emergent properties that resist a reductionist approach, it makes more sense to build models that only attempt to capture some aspect of the system behaviour, instead of reproducing the whole thing.
An example of this approach, discussed earlier on this blog, relates to the question of predicting heart toxicity for new drug compounds, based on ion channel readings. One way to predict heart toxicity based on these test results is to employ teams of researchers to build an incredibly complicated mechanistic model of the heart, consisting of hundreds of differential equations, and use the ion channel inputs as inputs. Or you can use a machine learning model. Or, most complicated, you can combine these in a multi-model approach. However Hitesh Mistry found that a simple model, which simply adds or subtracts the ion channel readings – the only parameters are +1 and -1 – performs just as well as the multi-model approach using three large-scale models plus a machine learning model (see Complexity v Simplicity, the winner is?).
Now, to obtain the simple model Mistry used some fairly sophisticated data analysis tools. But what counts is not the complexity of the methods, but the complexity of the final model. And in general, complexity-based models are often simpler than their reductionist counterparts.
I therefore strongly agree with Michael Gilliland that a “defensive” approach makes sense. But I think the paradigm shift he describes is part of, or related to, a move away from reductionist models, which we are realising don’t work very well for complex systems. With this new paradigm, models will be simpler, but they can also draw on a range of techniques that have developed for the analysis of complex systems.
In a previous blog post we highlighted the pitfalls of applying null hypothesis testing to simulated data, see here. We showed that modellers applying null hypothesis testing to simulated data can control the p-value because they can control the sample size. Thus it’s not a great idea to analyse simulations using null hypothesis tests, instead modellers should focus on the size of the effect. This problem has been highlighted before by White et al. which is well worth a read, see here.
Why are we blogging about this subject again? Since that last post, co-authors of the original article we discussed there have repeated the same misdemeanour (Liberos et al., 2016), and a group of mathematical oncologists based at Moffitt Cancer Center has joined them (Kim et al., 2016).
The article by Kim et al., preprint available here, describes a combined experimental and modelling approach that “predicts” new dosing schedules for combination therapies that can delay onset of resistance and thus increase patient survival. They also show how their approach can be used to identify key stratification factors that can determine which patients are likely to do better than others. All of the results in the paper are based on applying statistical tests to simulated data.
The first part of the approach taken by Kim et al. involves calibrating a mathematical model to certain in-vitro experiments. These experiments basically measure the number of cells over a fixed observation time under 4 different conditions: control (no drug), AKT inhibitor, Chemotherapy and Combination (AKT/Chemotherapy). This was done for two different cell lines. The authors found a range of parameter values when trying to fit their model to the data. From this range they took forward a particular set, no real justification as to why that certain set, to test the model’s ability to predict different in-vitro dosing schedules. Unsurprisingly the model predictions came true.
After “validating” their model against a set of in-vitro experiments the authors proceed to using the model to analyse retrospective clinical data; a study involving 24 patients. The authors acknowledge that the in-vitro system is clearly not the same as a human system. So to account for this difference they perform an optimisation method to generate a humanised model. The optimisation is based on a genetic algorithm which searched the parameter space to find parameter sets that replicate the clinical results observed. Again, similar to the in-vitro situation, they found that there were multiple parameter sets that were able to replicate the observed clinical results. In fact they found a total of 3391 parameter sets.
Having now generated a distribution of parameters that describe patients within the clinical study they are interested in, the authors next set about generating stratification factors. For each parameter set the virtual patient exhibits one of four possible response categories. Therefore for each category a distribution of parameter values exists for the entire population. To assess the difference in the distribution of parameter values across the categories they perform a students t-test to ascertain whether the differences are statistically significant. Since they can control the sample size the authors can control the standard error and p-value, this is exactly the issue raised by White et al. An alternative approach would be to state the difference in the size of the effect, so the difference in means of the distributions. If the claim is that a given parameter can discriminate between two types of responses then a ROC AUC (Receiver Operating Characteristic Area Under Curve) value could be reported. Indeed a ROC AUC value would allow readers to ascertain the strength of a given parameter in discriminating between two response types.
The application of hypothesis testing to simulated data continues throughout the rest of the paper, culminating in applying a log-rank test to simulated survival data, where again they control the sample size. Furthermore, the authors choose an arbitrary cancer cell number which dictates when a patient dies. Therefore they have two ways of controlling the p-value. In this final act the authors again abuse the use of null hypothesis testing to show that the schedule found by their modelling approach is better than that used in the actual clinical study. Since the major results in the paper have all involved this type of manipulation, we believe they should be treated with extreme caution until better verified.
Liberos, A., Bueno-Orovio, A., Rodrigo, M., Ravens, U., Hernandez-Romero, I., Fernandez-Aviles, F., Guillem, M.S., Rodriguez, B., Climent, A.M., 2016. Balance between sodium and calcium currents underlying chronic atrial fibrillation termination: An in silico intersubject variability study. Heart Rhythm 0. doi:10.1016/j.hrthm.2016.08.028
White, J.W., Rassweiler, A., Samhouri, J.F., Stier, A.C., White, C., 2014. Ecologists should not use statistical significance tests to interpret simulation model results. Oikos 123, 385–388. doi:10.1111/j.1600-0706.2013.01073.x
Kim, E., Rebecca, V.W., Smalley, K.S.M., Anderson, A.R.A., 2016. Phase i trials in melanoma: A framework to translate preclinical findings to the clinic. Eur. J. Cancer 67, 213–222. doi:10.1016/j.ejca.2016.07.024