Monthly Archives: February 2017

Survival prediction (P2P loan profitability) competitions

In a previous blog entry, see here, we discussed how survival analysis methods could be used to determine the profitability of P2P loans.  The “trick” highlighted in that previous post was to focus on the profit/loss of a loan – which in fact is what you actually care about – rather than when and if a loan defaults.  In doing so we showed that even loans that default are profitable if interest rates are high enough and the period of loan short enough.

Given that basic survival analysis methods shed light on betting strategies that could be profitable, are there more aggressive approaches that exist in the healthcare community that the financial world could take advantage of? The answer to that question is yes and it lies in using crowdsourcing as we shall now discuss.

Over recent years there has been an increase in prediction competitions in the healthcare sector.  One set of organisers have aptly named these competitions as DREAM challenges, follow this link to their website. Compared to other prediction competition websites such as Kaggle here, the winning algorithms are made publicly available through the website and also published.

A recurring theme of these competitions, that simply moves from one disease area to the next, is survival. The most recent of these involved predicting the survival of prostate cancer patients who were given a certain therapy, results were published here.  Unfortunately the paper is behind a paywall but the algorithm is downloadable from the DREAM challenge website.

The winning algorithm was basically an ensemble of Cox proportional hazards regression models, we briefly explained what these are in our previous blog entry.  Those of you reading this blog who have a technical background will be thinking that doesn’t sound like an overly complicated modelling approach.  In fact it isn’t – what was sophisticated was how the winning entry partitioned the data for explorative analyses and model building.  The strategy appeared to be more important than the development of a new method.  This observation resonates with the last blog entry on Big data versus big theory.

So what does all this have to do with the financial sector? Well competitions like the one described above can quite easily be applied to financial problems, as we blogged about previously, where survival analyses are currently being applied for example to P2P loan profitability. So the healthcare prediction arena is in fact a great place to search for the latest approaches for financial betting strategies.

Big data versus big theory

The Winter 2017 edition of Foresight magazine includes my commentary on the article Changing the Paradigm for Business Forecasting by Michael Gilliland from SAS. Both are behind a paywall (though a longer version of Michael’s argument can be read on his SAS blog), but here is a brief summary.

According to Gilliland, business forecasting is currently dominated by an “offensive” paradigm, which is “characterized by a focus on models, methods, and organizational processes that seek to extract every last fraction of accuracy from our forecasts. More is thought to be better—more data, bigger computers, more complex models—and more elaborate collaborative processes.”

He argues that our “love affair with complexity” can lead to extra effort and cost, while actually reducing forecast accuracy. And while managers have often been seduced by the idea that “big data was going to solve all our forecasting problems”, research shows that even with complex models, forecast accuracy often fails to beat even a no-change forecasting model. His article therefore advocates a paradigm shift towards “defensive” forecasting, which focuses on simplifying the forecasting process, eliminating bad practices, and adding value.

My comment on this (in about 1200 words) is … I agree. But I would argue that the problem is less big data, or even complexity, than big theory.

Our current modelling paradigm is fundamentally reductionist – the idea is to reduce a system to its parts, figure out the laws that govern their interactions, build a giant simulation of the whole thing, and solve. The resulting models are highly complex, and their flexibility makes them good at fitting past data, but they tend to be unstable (or stable in the wrong way) and are poor at making predictions.

If however we recognise that complex systems have emergent properties that resist a reductionist approach, it makes more sense to build models that only attempt to capture some aspect of the system behaviour, instead of reproducing the whole thing.

An example of this approach, discussed earlier on this blog, relates to the question of predicting heart toxicity for new drug compounds, based on ion channel readings. One way to predict heart toxicity based on these test results is to employ teams of researchers to build an incredibly complicated mechanistic model of the heart, consisting of hundreds of differential equations, and use the ion channel inputs as inputs. Or you can use a machine learning model. Or, most complicated, you can combine these in a multi-model approach. However Hitesh Mistry found that a simple model, which simply adds or subtracts the ion channel readings – the only parameters are +1 and -1 – performs just as well as the multi-model approach using three large-scale models plus a machine learning model (see Complexity v Simplicity, the winner is?).

Now, to obtain the simple model Mistry used some fairly sophisticated data analysis tools. But what counts is not the complexity of the methods, but the complexity of the final model. And in general, complexity-based models are often simpler than their reductionist counterparts.

I therefore strongly agree with Michael Gilliland that a “defensive” approach makes sense. But I think the paradigm shift he describes is part of, or related to, a move away from reductionist models, which we are realising don’t work very well for complex systems. With this new paradigm, models will be simpler, but they can also draw on a range of techniques that have developed for the analysis of complex systems.