Change over time is not "treatment response"
This will be a non-technical post illustrating the problems with identifying treatment responders or non-responders using inappropriate within-group analyses. Specifically, I will show why it is pointless to try to identify a subgroup of non-responders using a naïve analysis of data from one treatment group only, even though we have weekly measures over time.
This type of unwarranted and misleading causal language is surprisingly common. You see this type of language even when the data comes from an RCT, often in secondary analyses of mediators, predictors, or in even trendier studies where the authors have a strong urge to mix omics (e.g., genomics) and artificial intelligence/machine learning for vague reasons.
When we talk about “treatment response” we usually mean the causal estimand, i.e., the counterfactual comparison of the effect of receiving the treatment versus not-receiving the treatment. Deep down we all know that a treatment outcome is a relative contrast, and not simply a patient’s change from pre to posttest. We can (usually) only estimate the effect of the treatment on the group level, by randomizing patients to a treatment or a control group and estimating the average treatment effect by comparing the expected outcome in the two groups. The rate at which we tend to “forget” this basic knowledge seems directly related to our drive to publish in high impact journals.
I will use
powerlmm to simulate some longitudinal treatment data. In this setup, the control group has a ~40% reduction in symptoms at posttest, which in this case would give a Cohen’s d of 0.5. And the actual treatment effect is d = 0.5, i.e., a 7 point difference at posttest between the treatment and control group. These would not be surprising numbers in some psychiatric trials.
We want to add a subgroup of “non-responders”, and we do this by simply assigning subjects to group A or B based on their latent random slopes. This is easily done using the
data_transform argument. The non-responders have subject-specific slopes 1 SD above the mean. Obviously, this is a bit simplistic, but it gets the point across.
If we plot one realization of the latent slopes from this data generating process, we get this figure.
Let’s say we have access to data from a clinic that carefully monitor their patients’ change during treatment (e.g., IAPT). The sample size is large, and we want to identify if there is a subgroup of patients that are “non-responders”. There is no control group, but we have weekly measures during the treatment (there is also perfect adherence and no dropout…), we also have access to some cool baseline marker data that will impress reviewers and enable “personalized treatment selection for non-responding patients”. So what we do is run an analysis to identify if there is an interaction between our proposed subgroup variable and change over time. Let’s run such a simulation.
We see that we tend to identify a powerful subgroup effect, participants in the A subgroup tend to change less and even deteriorate during the treatment. Surely, this is substantial proof that there is a differential treatment response and that these patients did not respond to the treatment? I’m sure you know the answer, but to be sure let us also run the simulation were we have access to some similar data from another source that actually randomized their patients to receive either the treatment or a placebo control. What we want now is to identify an interaction between the subgroup variable and the treatment-by-time interaction. So we update the simulation formula.
Using the appropriate estimand of a subgroup effect we now get the “disappointing” result that there is absolutely no difference in treatment response for participants in subgroup A or B. The coefficient
subgroupB:time:treatment is zero on average. To make it even more clear what is going on let us also plot the trends.
We see that the “non-responders” actually improved just as much as the “responders” when compared to the control group. The effect from the naïve within-group analysis clearly confounds treatment response with just “change”. You might think that there is no reason why Subgroup A would start deteriorating just as they enter the study. It is easy to think of different explanations for this; one explanation might be that the marker we used to identify the subgroup is related to some personality trait that makes participants more likely to seek treatment when they feel “good enough to start treatment”. It is not uncommon to hear such reasoning. During the study period these patients will tend to regress toward their typical symptom level, but the deterioration is somewhat slowed down thanks to the treatment. Patients in subgroup B on the other hand, seek treatment because they have felt worse than usual for a period. So our marker for “treatment response” do not identify responders and non-responders it simply identifies patients on different trajectories, i.e., it is a prognostic marker.
The point of this post is not just semantics (treatment response versus change). The take-home message is important: by using the naïve analysis and the unwarranted causal language, we risk spreading the misinformation that patients belonging to subgroup A do no benefit from the treatment, when in fact they do. This risks creating an unnecessary stigma for a patient group that might already be seen as “difficult”, and inspire suboptimal “personalized” treatments that could be less likely to work.
Published November 19, 2018 (View on GitHub)