Does a 3-year structured exercise program save lives? Part 1
Can we challenge the CHALLENGE trial results?
I want to start by congratulating the authors, the investigators, and the patients for participating and conducting the CHALLENGE trial, presented during the ASCO 2025 meeting, one of the biggest oncology conference worldwide.
The CHALLENGE results were presented in early June 2025, and published in the New England Journal of Medicine the same day.
In CHALLENGE, patients were randomized after adjuvant chemotherapy for colon cancer to either a 3-year structured exercise program or only health education materials.
The findings are impressive and making the headlines: there is an overall survival benefit in the structured exercise program!
Reasons for censoring in CHALLENGE?
The NEJM report is informative in terms of reasons for censoring, and again I thank the authors for that. They are provided in Table 3.
However, the censoring patterns cannot be properly explored because there are no tick marks or circles to indicate when patients were censored, nor the number of patients censored in brackets after the number of patients at risk (see below the Overall Survival curves from the NEJM publication, 2025)
Why censoring could matter in CHALLENGE?
In the CHALLENGE trial, it is possible that patients who dropped out from the experimental, more intensive arm, did so because they were frailer patients, not able to sustain the programme, and at the same time they could have been those presenting a higher risk for the event. This would constitute, if patients were censored in excess in this arm, the type of informative censoring that we usually observe because of toxicity.
In CHALLENGE, we know the number of events, and the number and reasons for censoring in each arm.
Exercise arm: 41 patients died, 22 patients were censored after withrawing consent, and 8 were censored due to loss to follow-up.
Control arm: 66 patients died, 16 patients withdrew consent, 6 were loss to follow-up.
Overall, 8 additional patients were censored due to withdrawal of consent or loss to follow-up in the experimental arm as compared to the control arm. Is this relevant?
One assumption, and two BREAKING-ICE sensitivity analyses.
Let's assume that those 8 additional patients were the frailer ones, unable to maintain the exercise programme, left the trial, and later presented the event (death) which was uncaptured. We can model this sensitivity analysis by randomly choosing 8 censored patients and modifying their censoring status into an event.
After digitizing the curves, we can obtain synthetic Individual Patient Data and explore those assumptions, using the BREAKING-ICE App© available here: https://www.timotheeolivier-research.com/breaking-ice
You can select the trial "CHALLENGE-OS-Sens1" for a first sensitivity analysis where 8 censored patients, in the experimental arm, over the whole study period (corresponding to 2.3% of censored patients over this period of time) have their status changed from being censored to the event (i.e. death): the survival gain is no longer statistically significant (see green curve below and statistical output with the upper bound of the HR now = 1.04).
The reverse Kaplan-Meier analysis further support the possibility of informative censoring.
In this method, each event is artificially modified into a censoring event, and vice versa: the KM curves are displayed and a Cox analysis is performed. If a statistical difference is seen (HR, CI, and p-value), this suggests informative censoring could have occurred.
You can see the analysis for CHALLENGE, based on the same synthetic Indivical Patient Data, by selecting "CHALLENGE-OS" here : https://www.timotheeolivier-research.com/reverse-km
As you can see, the HR = 1.31 (CI from 1.13 to 1.52, p-value 0.00), further suggesting the possibility that informative censoring may have occurred.
Conclusion - reasons and time of censoring are needed.
In CHALLENGE, why such phenomen could happen? In the exercise arm, 30 patients were censored due to withdrawal or loss to follow-up, while 41 patients died. Both numbers are close, so it’s logical that informative censoring, if it occurred, could have influenced the results, at least partly.
It's not possible to conclude that the results of CHALLENGE are not valid based on these sensitivity analyses, however reasons for and the time of censoring, including performing similar sensitivity analyses by the authors themselves, would provide more robustness to the reported survival gain.
I hope this first take on CHALLENGE demonstrates how censoring is key in interpreting clinical trial results. Stay tuned for part 2 where I will explain why I would not (yet) implement the CHALLENGE stragegy, and why!