Registration is open for a one-day data analysis workshop which will be a mix of talks and discussions. The talks will go through analysis methods and examples that will show how you can get more from your own data as well as how to leverage external data (open genomic and clinical data). Participants can also bring an old poster and discuss data analysis options with speakers for some on-the-day consulting. To register and find out more about the day follow this link.

# Category Archives: Uncategorized

# PDXdata – the gene-drug patient derived xenograft (PDX) app

There is growing interest, within the pharmaceutical industry, in using a patients’ own cancer material to screen the effect of numerous treatments within an animal model. The reason for shifting from standard xenografts, based on historical immortalised cell-lines, is that those models are considered to be very different to the patients’ tumours of today. Thus PDX models are considered to be a more relevant model as they are “closer” to the patient population in which you are about to test your new treatment.

Typically only a handful of PDX models are used, but recently there has been a shift in focus to perform population PDX studies which mimic small scale clinical trials. One of these studies by Gao et al. also published, as an excel file in the supplementary information, the raw data which included not only the treatment effect growth curves, but also genomic data consisting of DNA copy number, gene expression and mutation. Using this data it is possible to explore correlations between treatment response and genomic features.

We at Systems Forecasting are always appreciative of freely available data sets, and have designed an equally free and available PDXdata app to browse through this data.

The app can be used to read excel files in the same form as the Novartis file “nm.3954-S2.xlsx”. It translates volume measurements to diameter, and computes a linear fit to each tumour growth time series. The user can then plot time series, organised by ID or by treatment, or examine statistics for the entire data set. The aim is to explore how well linear models can be used to fit this type of data.

The “Diameters” page shown in the figure below is used to plot time series for data. First read in the excel data file; this may take a while, so a progress bar is included. Data can be grouped either by ID(s) or by treatment(s). Note the Novartis data has several treatments for the same ID, and the data is filtered to include only those IDs with an untreated case. If there is only one treatment per ID, one can group by treatment and then plot the data for the IDs with that treatment. In this case the untreated fit is computed from the untreated IDs.

As shown in the next figure, the “Copy Number” and “RNA” tabs allow the user to plot the correlations between copy number or RNA and treatment efficacy, as measured by change in the slope of linear growth, for individual treatments (provided data is available for the selected treatment).

Finally, the “Statistics” page plots a histogram of data derived from the linear models. These include intercept, slope, and sigma for the linear fit to each time series; the difference in slope between the treated and untreated cases (delslope); the growth from initial to final time of the linear fit to the untreated case (lingr); and the difference delgr=diamgr-lingr, which measures diameter loss due to drug.

This app is very much a work-in-progress, and at the moment is primarily a way to browse, view and plot the data. We will add more functionality as it becomes available.

# Model abuse isn’t unique to transport forecasting …

By David Orrell

Yaron Hollander from the consultancy firm CT Think! published an interesting report on the use and abuse of models in transport forecasting. The report, which was summarised in Local Transport Today magazine, cited ten different problems, which apply not just to transport forecasting but to other areas of modelling as well:

*1. Referring to model outputs when discussing impacts that weren’t modelled*

*2. Presenting modellers’ assumptions as if they were forecasts*

*3. “Blurring the caveats” provided by modellers when copying model outputs from a technical report to a summary report*

*4. Using model outputs at a level of geographical detail that does not match the capabilities of the model or the data that were used to develop it*

*5. Reporting estimated outcomes and benefits with a high level of precision, without sufficient commentary on the level of accuracy*

*6. Presenting a large number of model runs or scenarios with limited interpretation of each run, as if this gives a good understanding of the impacts of the investment*

*7.Avoiding clear statements about how unsure we really are about the future pace of social and economic trends*

*8. Testing the sensitivity of the results to some inputs as if it helps us understand the sensitivity to all inputs*

*9. Discussing uncertainty in forecasts as if all it could do is change the scale of the impacts, ignoring possible impacts of a very different nature*

*10. Avoiding discussions about the history of the model itself, which sometimes goes many years back and includes features that the current owners do not understand*

I was invited along with several other people to give a response, which is included below. Although I didn’t mention computational biology as one of the areas affected, it certainly isn’t immune!

Here is the full response, which was published in LTT (paywall):

**Forget complexity, models should be simple**

The report by Yaron Hollander accurately identifies a number of different types of “model abuse” in transport forecasting. I would just add a couple of comments. One is that these problems are not unique to transport, but are common in many other areas of forecasting as well, as I found while researching my 2007 book *The Future of Everything: The Science of Prediction*. This is especially the case when the incentives of the forecasters are entwined with the outcome of the predictions.

An example from the early 1980s was a paper by Will Keepin and Brian Wynne which showed that a model used by nuclear scientists to predict future energy requirements vastly overestimated the need for nuclear power plants, as well as the number of nuclear scientists needed to design them. In finance, many of the models used to value complex derivatives are less about accuracy, than about justifying risky trades. This is why two leading quants, Paul Wilmott and Emanuel Derman, wrote their own Modelers’ Hippocratic Oath. Even apparently objective areas such as weather forecasting are not immune from model abuse. I would argue that techniques such as ensemble forecasting, which involves running many forecasts from perturbed initial conditions, are an example of Hollander’s point 8: “Testing the sensitivity of the results to some inputs as if it helps us understand the sensitivity to all inputs.”

The author notes that public consultation is a promising solution, however one of the attractive features of mathematical models, if defending them is the aim, is exactly the fact that they can only be understood by a relatively small number of experts (who often come from the same area). Mathematical equations can seem imposing to those outside the field, which grants a degree of immunity from external scrutiny. So the public needs access to experts who are willing to point out the flaws in models.

Mathematical modellers are always happy to build complex models of any system and attempt to make predictions. But we need more studies which attempt to answer a different forecasting question: based on past experience, and knowledge of a model’s strengths and weaknesses, are predictions based on the model likely to be accurate? The answer in many cases is “probably not” – which has implications for decision-makers. This does not of course mean that we should do away with modelling, only that we should concentrate on simple models, where the assumptions and parameters are well-understood, and be realistic about the uncertainty involved.

# When is a model a black box?

One of the issues which comes up frequently with mathematical modelling is the question of whether a model is a “black box”. A model based on machine learning, for example, is not something you can analyse just by peering under the hood. It is a black box even to its designers.

For this reason, many people feel more comfortable with mechanistic models which are based on causal descriptions of underlying processes. But these come with their problems too.

For example, a model of a growing tumour might incorporate a description of individual cells, their growth dynamics, their interactions with each other and the environment, their access to nutrients such as oxygen, response to drugs, and so on. A 3D model of a heart has to incorporate additional effects such as fluid dynamics, electrophysiology, and so on. In principal, all of these processes can be written out as mathematical equations, combined into a huge mathematical model, and solved. But that doesn’t make these models transparent.

One problem is that each component of the model – say an equation for the response of a cell to a particular stimulus – is usually based on approximations and is almost impossible to accurately test. In fact there is no reason to think that complex natural phenomena can be fit by simple equations at all – what works for something like gravity does not necessarily work in biology. So the fact that something has been written out as a plausible mechanistic process does not tell us much about its accuracy.

Another problem is that any such model will have a huge number of adjustable parameters. This makes the model very flexible: you can adjust the parameters to get the answer you want. Models are therefore very good at fitting past data, but they often do less well at predicting the future.

A complex mechanistic model is therefore a black box of another sort. Although we can look under its hood, and see all the working parts, that isn’t very useful, because these models are so huge – often with hundreds of equations and parameters – that it is impossible to spot errors or really understand how they work.

Of course, there is another kind of black box model, which is a model that is deliberately kept inside a black box – think for example of the trading algorithms used by hedge funds. Here the model may be quite simple, but it is kept secret for commercial reasons. The fact that it is a closely-guarded secret probably just means that it works.