A machine learning framework for predicting population-wise rwTTD
Termination of a specific treatment can be considered as survival data,
where an observed termination of treatment is an event point and
otherwise the patient is censored (Fig. 1a )1.
However, existing survival models only predict individual patient’s
likelihood of survival. As shown below shortly, the aggregation of
individuals does not represent the profile of a population. Therefore,
we designed an approach that predicts the termination curve of a
population.
We started with producing the gold standard (expected future time) for
each individual in the training population. This expected future time is
defined as the time expected until the treatment is terminated from the
point at which we are going to make the predictions. Prior to this
point, all observed clinical data are available for making predictions.
Two cases can be considered here. In the first case, if we know the
termination time of the treatment (an ‘event’ data point), the patient’s
future time is defined as the time between the end of the observation
window, from which we collect feature data used to make prediction, and
the drug termination time. In the second case, if the termination time
of the treatment is unknown for a patient (a ‘censored’ data point), we
infer the expected future time from the survival curve derived from the
training population. In this case, we use a popular method,
Kaplan–Meier curve, to represent the termination ratio of the training
set 13.
The expected future time is then composed of two parts. The first part
is the existing time lapse, i.e. , from the end of the observation
time window to the last contact time point, because we know without
uncertainty that the patient continued drug treatment until the last
contact time point. The second part is the expected time after the last
contact time point, which is calculated as the integral of the curve
beyond the last contact time point divided by the terminated ratio at
the last contact time point (Fig. 1a ). Adding the first and
second part together results in the expected future time for the
censored individuals. This approach generates the gold standard for
predicting the expected future time for each individual into which any
kinds of base learners can be built. Later, we will explain how a nested
training scheme can extrapolate and aggregate the predictions from
individuals to infer the terminated ratio curve for a population.
We simulated drug termination data of a population following a survival
study 14(Fig. 1b ). We generated a population of total nindividuals, where the termination rate for each individual is drawn
from a population of p ~ N(pmean ,𝛔 ), and we force the minimal termination rate to be zero. We
hypothesize that the probability that a patient terminates the treatment
(p ) on a single day is driven by a series of (m in total)
predictive features f . These features, in reality, can be
demographic information, clinical measurements or any claim data, as
will be shown with the real world drug treatment experiment below. In
this simulation experiment, we Let individual feature values correlates
to p by:
\(v_{\text{kj}}=p_{k}\times f_{j}(1+\theta\times\epsilon_{j}\))
Where \(v_{\text{kj}}\) is the value of feature j for patientk . \(p_{k}\) is the termination rate of Patient k .fj represents the scaling factor of a particular
feature, uniformly drawn between [0, 𝞪]. Each feature j is
parameterized by noise factor 𝞮j , uniformly drawn
from [0, 𝞫]. When 𝞫 goes up, a larger sampling range will result in
less correlation between the feature and the expected future time. The
value of the jth feature of the kth sample,vkj , is further parameterized by 𝜽, which is
uniformly distributed sampled between [-0.5, 0.5].
We set the maximal allowed observation date of all individuals to𝞭max . Between [0,𝞭max ], we create a binomially distributed
vector of length 𝞭 k ~B (𝞭max, , \(p_{k}\)) for each individualk . Thus, the higher the \(p_{k}\), the more likely the individual
is to be terminated with the uncertainty defined by the binomial
distribution. In this binomially sampled sequence, the first appearance
of 1 decides the termination date tterm . Next,
for each individual, we uniformly sampled between [0,𝞭max ] and define the censoring datetcensor . Iftterm >tcensor ,
the last observation timetlast=tcensor , and the status is
0 (censored point and no termination date is observed); otherwise, thetlast=tterm with a status =1
(termination observed and the date is defined).