Category Archives: survival

“Personalising” radiotherapy dose using a short-term culture assay

The data-set we will be using for this post relates to the paper here. The article is from the group of Bruce Baguley who has written some fantastic papers on the cell-cycle times of cancer cells from patient samples over the years. (The data can be found here together with code relating to this post.)

In the data-set we have information on the culture cell cycle times for each patient’s sample, whether they had radiotherapy or not, and the patient’s survival times. (Note, this study was done long before numerous people had written about sample size calculations for multivariable survival analyses.) In total we have 70 patients all of whom had an event and the median survival was approximately 8 months. Of the 70 patients 37 had radiotherapy, 24 did not and for 9 we have no information. Just before we launch into the survival analysis and get overly excited, it’s worth noting the following…

There are numerous prognostic factors which have not been collected in this study, some are known now but weren’t known when this study was performed. This is an important point and should never be overlooked. Some of these known/unknown prognostic factors may well correlate with cell-cycle times and these may not need a tissue sample i.e. they could be really easy to measure. We shall come back to this point at the end of the post.

In the code provided you will see that we first build a survival model using radiotherapy as a covariate and find that there is a survival difference, those that have radiotherapy (black line in figure below) live longer than those that don’t (red line in figure below). (Let’s throw in a p-value to make certain readers happy, p<0.001.) So, we have a treatment effect.

We next assess whether in addition to the treatment effect cell-cycle times also correlate with survival. So, we add that to the model and lo and behold that improves our correlation to survival over a model with just radiotherapy, based on the likelihood ratio-test. What we really care about though is the interaction between treatment and cell-cycle times… hold your breath…there does seem to be an interaction (see code here) – everyone cheers with delight. (Note, you may want to use splines – the relationship between log(hazard) and biomarkers can be non-linear, it probably is; recall the Circulating Tumour Cell story here.)

A simple way of looking at an interaction could be to simply plot survival probabilities, at a certain time-point, as a function of the biomarker, with and without radiotherapy, see below (red is no radiotherapy and black is with radiotherapy). (In the code you will also find a calibration plot showing how well the model describes the data at 6 months.)

Two plots are displayed on purpose, the one on the left is the point estimate only and the one on the right includes the 95% confidence interval (dashed lines). What the plot shows is that the survival benefit of radiotherapy becomes less certain with increasing cell-cycle times. To some people this is what you would expect – the benefit of RT to be dependent on cell-cycle times. (If you do a search for radiotherapy and cell-cycle you will start to understand the reasons why.) How does seeing the confidence interval affect your interpretation? What if I also mention the cell-cycle times come with a low degree of precision. These uncertainties may play an even bigger role when thinking about personalising dose…

Although information on what the radiotherapy dose is not available it is likely that all the patients had the same dose. (It’s a single center study.) Therefore, the plot shown above we could argue is for a dose of 0 (black line) and another dose unspecified (red line), so we have two “points” on a dose-response curve.  That’s clearly not enough.  That is, we don’t really know what the benefit of radiotherapy is over no therapy at say either a lower or higher dose across the cell-cycle range without more data.

If we could generate more data what would be useful is a plot that shows the gain in survival, say at a fixed time-point of interest, for different doses and cell-cycle times and a corresponding plot for toxicities of interest. Such that as a patient I could see what the cost is to me – what am I prepared to endure and also how uncertain are these estimates. I wonder if we were to account for uncertainty the predicted dose would become a predicted dose-range and it may actually include the current standard dose for all patients mightn’t it? Finally, …

There are numerous technical issues too, can we get good samples, what are the patient characteristics for those we can’t compare to those we can? How easy is the assay to run – could I do it in the Outer Hebrides? Maybe we also want to consider certain confounders – what would be a big one for cell-cycle time? Maybe tumour volume? Hmmm…

Circulating “Tumour” Cells and Magic Numbers

“Three, that’s the Magic Number

Yes, it is, it’s the Magic Number…”

The above two lines are the opening lyrics of De La Souls hit song from the 1980s “The Magic Number”. The number three also happens to be an important number if you are counting Circulating Tumour Cells (CTC) using the Cell Search kit in metastatic colorectal cancer (mCRC). (See mCRC section, p18 onwards, in the marketing brochure here for more details and also a publication here.) That number directly relates to a patients’ survival prognosis: if a patient has 3 or more CTCs prior to treatment, then that patients’ prognosis will be poorer than for a patient with less than 3 CTCs.

You may be wondering, is the survival probability the same for someone who has 4 versus 100 CTCs? When the platform says 3 how accurate is it? Why did I put “Tumour” in quotation marks? In this blog-post we will briefly explore these questions.

Why the quotation marks around Tumour? Our first port of call will be the brochure (link) mentioned above. The first section of the brochure focusses on the Limitations, Expected Values and Performance Characteristics of the kit.  If you take a look at Figure 1 on pVII you will see the distribution of CTC counts across numerous metastatic tumour types, benign disease and healthy volunteers. 10 of the 295 health volunteer samples contained a single CTC. This doesn’t mean the healthy volunteers have cancer but simply highlights that the system may also pick up healthy epithelial cells. Therefore, it may be more appropriate to call these cells Circulating Epithelial Cells rather than Circulating Tumour Cells. Discussions I’ve had with scientists at conferences seem to agree with this, as it appears there is a mix of healthy and cancerous epithelial cells within a sample.

Next, how good is the system at sensing the number of tumour cells when you know approximately how many were in the sample to begin with? The answer to this question can be seen in Table 4 on pVII of the marketing brochure (link).  What the table highlights is that as you would expect the recovery is not 100% accurate, there is a modest difference between the expected number and the observed in the samples.

In summary there is a degree of noise in the enumeration process of CTCs. This noise may explain why when scientists have searched for thresholds, they have been quite low.  It may be that you need to see 3, 4 or 5 to be sure you actually have one genuine CTC in your sample. So, it could be the thresholds being used are simply related to whether or not there are CTCs there in the sample.

Moving on to the final question, is a patient’s risk of death the same if they have a small number of CTCs versus a large number.  In order to answer this, we need a data-set. A digitized data-set from a study in metastatic castrate resistant prostate cancer (mCRPC) will be used, the original study can be read about here and the data-set can be found here with some example code going through the analysis below. The cohort contains 156 patients, 94 deaths with a median survival time of 21 months (95% confidence interval was 16-24 months).

In mCRPC the Magic Number is 5. If patients have less than 5 their prognosis is better than those that have 5 or more.  So, the question we are interested in, is the prognosis of a patient with 5 CTCs different from someone who has 100 CTCs?

Below is a figure of the distribution of CTC counts in this cohort of patients. You will notice that there is a large proportion of patients with 0 or 1 values, 38/156 and 15/156 respectively.  There is generally quite a wide range of values.

The next plot we shall look at is one of survival probability over time of groups of patients generated by splitting the distribution of CTC counts into 8 groups (generated by looking at the 12.5th, 25th, 37.5th, 50th, 62.5th, 75th and 87.5th percentiles), see below.

The plot shows that as a patient’s CTC count increases their prognosis worsens. Imagine a patient with a CTC count of 5 (located in the dark blue group) versus a patient with CTC count 100 (located in the grey group). It’s clear the prognosis of these two groups is different.  Yet if we use the Magic Number 5 both patients will be told they have the same prognosis, which is clearly incorrect. Let’s explore this further and move away from categorising CTC counts…

An alternative way of visualizing the data can be obtained by plotting the log(Hazard Ratio) as a function of CTC counts according to the groups, see plot below.

We can see that the relationship is not quite linear nor does there appear to be an obvious cut-point. In fact, the relationship looks rather sigmoidal, like a Hill function (or to pharmacologists an Emax model). In fact we can fit a Hill function, Hmax/(1+(CTC50/CTC)^h), to the data as shown below.  (Hmax, CTC50 and h are parameters that need to be estimated.)

So how does this compare with using the Magic Number 5? In the code, see here, you will see a comparison of model likelihoods which highlights, unsurprisingly, that a sigmoid model better describes the data than using the magic number 5 as well as many other discrimination indexes.

This brief analysis clearly shows using a Magic Number approach to analysing the correlation between CTC counts and prognosis is clearly not in the patient’s favour.  Imagine if this was your data, let’s stop dichotomania!

Radiomics meet Action-Potential-Omics and Recidivism-Omics

In previous blog-posts we have discussed how simple models can perform just as well if not better than more complex ones when attempting to predict the cardiac liability of a new drug, see here and for the latest article on the matter here. One of the examples we discussed involved taking a signal, action-potential, deriving 100s of features from it and placing them into a machine learning algorithm to predict drug toxicity. This approach gave very impressive performance. However, we found that we could get the same results by simply adding/subtracting 3 numbers together!  It seems there are other examples of this nature…

A recent paper sent to me was on the topic of recidivism, see here. The paper explored how well a machine learning algorithm which uses >100 features performed compared to the general public at predicting re-offending risk.  What they found is that the general public was just as good.  They also found that the performance of the machine learning algorithm could be easily matched by a two variable model!

Let’s move back to the life-sciences and take a look at an emerging field called radiomics.  This field is in its infancy compared to the two already discussed above. In simple terms radiomics involves extracting information from an image of a tumour.  Now the obvious parameter to extract is the size of the tumour through measuring its volume, a parameter termed Gross Tumour Volume, see here for a more detailed description. In addition to this though, like in the cardiac story, you can derive many more parameters, from the imaging signal. Again similar to the cardiac story you can apply machine learning techniques on the large data-set created to predict events of interest such as patient survival.

The obvious question to ask is: what do you gain over using the original parameter that is derived, Gross Tumour Volume? Well it appears the answer is very little, see supplementary table 1 from this article here for a first example. Within the table the authors calculate the concordance index for each model. (A concordance index value of 0.5 is random chance whereas a value of 1 implies perfect association, the closer to 1 the better.) The table includes p-values as well as the concordance index, let’s ignore the p-values and focus on the size of effect. What the table shows is that tumour volume is as good as the radiomics model in 2 out of the 3 data-sets, Lung2 and H&N1, and in the 3rd, H&N2, TNM is as good as radiomics:

  TNM

(Tumour Staging)

Volume Radiomics TNM + Radiomics Volume + Radiomics
Lung2 0.60 0.63 0.65 0.64 0.65
H&N1 0.69 0.68 0.69 0.70 0.69
H&N2 0.66 0.65 0.69 0.69 0.68

 

They then went on and combined the radiomics model with the other two but did not compare what happens when you combine TNM and tumour volume, a 2 variable model, to all other options.  The question we should ask is why didn’t they? Also is there more evidence on this topic?

A more recent paper, see here, from within the field assessed the difference in prognostic capabilities between radiomics, genomics and the “clinical model”.  This time tumour volume was not explored, why wasn’t it? Especially given that it looked so promising in the earlier study. The “clinical model” in this case consisted of two variables, TNM and histology, given we collect so much more than this, is this really a fair representation of a “clinical model”?  The key result here was that radiomics only made a difference once you also included a genomic model too over the “clinical model” see Figure 5 in the paper. Even then the size of improvement was very small.  I wonder what the performance of a simple model that involved TNM and tumour volume would have looked like, don’t you?

Radiomics meet Recidivism-Omics and Action-Potential-Omics you have more in common with them than you realise i.e. simplicity may beat complexity yet again!

Is this the golden age for open patient level oncology data?

Over the last few years there has been a growth in databases that house individual patient data from clinical trials in Oncology.  In this blog post we will take a look at two of these databases, ProjectDataSphere and ClinicalStudyDataRequest, and discuss our own experiences of using them for research.

ProjectDataSphere houses the control arms of many phase III clinical trials. It has been used to run prediction competitions which we have discussed in a previous blog-post, see here. Gaining access to this database is rather straightforward.  A user simply fills in a form and within 24-48 hours access is granted. You can then download the data sets together with a data dictionary to help you decipher the variable codes and start your research project.  This all sounds too easy, so what’s the catch?

The main issue is being able to understand the coding of the variables, once you’ve deciphered what they mean it’s pretty straightforward to begin a project.  It does help if you have had experience working with such data-sets before. An example of a project that can be conducted with such data can be found here. In brief, the paper explores both a biased and un-biased correlation of tumour growth rate to survival, a topic we have blogged about before, see here.

If you want to access all arms of a clinical trial then ClinicalStudyDataRequest is for you. This is a very large database that spans many disease areas. However access to the data is not as straightforward as ProjectDataSphere.  A user must submit an analysis plan stating the research objectives, methods, relevant data-sets etc. Once the plan has been approved, which can take between 1-2 months from our experience, access is granted to the data-sets.  This access though is far more restrictive than ProjectDataSphere.  The user connects to a server where the data is stored and has to perform all analysis using the software provided, which is R with a limited number of libraries. Furthermore there is a time-restriction on how long you can have access to the data.  Therefore it really is a good idea to have every aspect of your analysis planned and ensure you have the time to complete it.

An example of a project we have undertaken using this database can be found here. In brief the paper describes how a model of tumour growth can be used to analyse the decay and growth rates of tumours under the action of three drugs that have a similar mechanism of action. A blog-post discussing the motivation behind the tumour growth model can be found here.

There are of course many databases other than the two discussed here with an oncology focus, e.g. SEER, TCGA, YODA etc. The growth in such databases clearly suggests that this may well be the golden age for patient level oncology data. Hopefully this growth in open data will also lead to a growth in knowledge.

Survival prediction (P2P loan profitability) competitions

In a previous blog entry, see here, we discussed how survival analysis methods could be used to determine the profitability of P2P loans.  The “trick” highlighted in that previous post was to focus on the profit/loss of a loan – which in fact is what you actually care about – rather than when and if a loan defaults.  In doing so we showed that even loans that default are profitable if interest rates are high enough and the period of loan short enough.

Given that basic survival analysis methods shed light on betting strategies that could be profitable, are there more aggressive approaches that exist in the healthcare community that the financial world could take advantage of? The answer to that question is yes and it lies in using crowdsourcing as we shall now discuss.

Over recent years there has been an increase in prediction competitions in the healthcare sector.  One set of organisers have aptly named these competitions as DREAM challenges, follow this link to their website. Compared to other prediction competition websites such as Kaggle here, the winning algorithms are made publicly available through the website and also published.

A recurring theme of these competitions, that simply moves from one disease area to the next, is survival. The most recent of these involved predicting the survival of prostate cancer patients who were given a certain therapy, results were published here.  Unfortunately the paper is behind a paywall but the algorithm is downloadable from the DREAM challenge website.

The winning algorithm was basically an ensemble of Cox proportional hazards regression models, we briefly explained what these are in our previous blog entry.  Those of you reading this blog who have a technical background will be thinking that doesn’t sound like an overly complicated modelling approach.  In fact it isn’t – what was sophisticated was how the winning entry partitioned the data for explorative analyses and model building.  The strategy appeared to be more important than the development of a new method.  This observation resonates with the last blog entry on Big data versus big theory.

So what does all this have to do with the financial sector? Well competitions like the one described above can quite easily be applied to financial problems, as we blogged about previously, where survival analyses are currently being applied for example to P2P loan profitability. So the healthcare prediction arena is in fact a great place to search for the latest approaches for financial betting strategies.

Time-dependent bias of tumour growth rate and time to tumour re-growth

The title of this blog entry refers to a letter published in the journal entitled, CPT: Pharmacometrics & Systems Pharmacology. The letter is open-access so those of you interested can read it online here.  In this blog entry we will go through it.

The letter discusses a rather strange modelling practice which is becoming the norm within certain modelling and simulation groups in the pharmaceutical industry. There has been a spate of publications citing that tumour re-growth rate (GR) and time to tumour re-growth (TTG), derived using models to describe imaging time-series data, correlates to survival [1-6]. In those publications the authors show survival curves (Kaplan-Meiers) highlighting a very strong relationship between GR/ TTG and survival.  They either split on the median value of GR/TTG or into quartiles and show very impressive differences in survival times between the groups created; see Figure 2 in [4] for an example (open access).

Do these relationships seem too good to be true? In fact they may well be. In order to derive GR/TTG you need time-series data. The value of these covariates are not known at the beginning of the study, and only become available after a certain amount of time has passed.  Therefore this type of covariate is typically referred to as a time-dependent covariate. None of the authors in [1-6] describe GR/TTG as a time-dependent covariate nor treat it as such.

When the correlations to survival were performed in those articles the authors assumed that they knew GR/TTG before any time-series data was collected, which is clearly not true. Therefore survival curves, such as Figure 2 in [4], are biased as they are based on survival times calculated from study start time to time of death, rather than time from when GR/TTG becomes available to time of death.  Therefore, the results in [1-6] should be questioned and GR/TTG should not be used for decision making, as the question around whether tumour growth rate correlates to survival is still rather open.

Could it be the case that the GR/TTG correlation to survival is just an illusion of a flawed modelling practice?  This is what we shall answer in a future blog-post.

[1] W.D. Stein et al., Other Paradigms: Growth Rate Constants and Tumor Burden Determined Using Computed Tomography Data Correlate Strongly With the Overall Survival of Patients With Renal Cell Carcinoma, Cancer J (2009)

[2] W.D. Stein, J.L. Gulley, J. Schlom, R.A. Madan, W. Dahut, W.D. Figg, Y. Ning, P.M. Arlen, D. Price, S.E. Bates, T. Fojo, Tumor Regression and Growth Rates Determined in Five Intramural NCI Prostate Cancer Trials: The Growth Rate Constant as an Indicator of Therapeutic Efficacy, Clin. Cancer Res. (2011)

[3] W.D. Stein et al., Tumor Growth Rates Derived from Data for Patients in a Clinical Trial Correlate Strongly with Patient Survival: A Novel Strategy for Evaluation of Clinical Trial Data, The Oncologist.  (2008)

[4] K. Han, L. Claret, Y. Piao, P. Hegde, A. Joshi, J. Powell, J. Jin, R. Bruno, Simulations to Predict Clinical Trial Outcome of Bevacizumab Plus Chemotherapy vs. Chemotherapy Alone in Patients With First-Line Gastric Cancer and Elevated Plasma VEGF-A, CPT Pharmacomet. Syst. Pharmacol. (2016)

[5] J. van Hasselt et al., Disease Progression/Clinical Outcome Model for Castration-Resistant Prostate Cancer in Patients Treated With Eribulin, CPT Pharmacomet. Syst. Pharmacol. (2015)

[6] L. Claret et al., Evaluation of Tumor-Size Response Metrics to Predict Overall Survival in Western and Chinese Patients With First-Line Metastatic Colorectal Cancer, J. Clin. Oncol. (2013)