Fit a landmarking model using a linear mixed effects (LME) model for the longitudinal data

This function is a helper function for fit_LME_landmark.

fit_LME_longitudinal(
  data_long,
  x_L,
  predictors_LME,
  responses_LME,
  predictors_LME_time,
  responses_LME_time,
  standardise_time = FALSE,
  random_slope_longitudinal = TRUE,
  random_slope_survival = TRUE,
  include_data_after_x_L = TRUE,
  cv_name = NA,
  individual_id,
  lme_control = nlme::lmeControl()
)

Arguments

data_long	Data frame containing repeat measurement data and time-to-event data in long format.
x_L	Numeric specifying the landmark time(s)
predictors_LME	Vector of character strings specifying the column names in `data_long` which correspond to the predictor variables in the LME model
responses_LME	Vector of character strings specifying the column names in `data_long` which correspond to the response variables in the LME model
predictors_LME_time	Vector of character strings specifying the column names in `data_long` which contains the time at which the predictor variables were recorded. This should either be length 1 or the same length as `predictors_LME`. In the latter case the order of elements must correspond to the order of elements in `predictors_LME`.
responses_LME_time	Vector of character strings specifying the column names in `data_long` which contain the times at which response variables were recorded. This should either be length 1 or the same length as `responses_LME`. In the latter case the order of elements must correspond to the order of elements in `responses_LME`.
standardise_time	Boolean indicating whether to standardise the time variable in the LME model by subtracting the mean and dividing by the standard deviation. See Details section of `fit_LME_longitudinal` for more information.
random_slope_longitudinal	Boolean indicating whether to include a random slope in the LME model. See Details section of `fit_LME_longitudinal` for more information.
random_slope_survival	Boolean indicating whether to include the random slope estimate from the LME model as a covariate in the survival submodel. See Details section of `fit_LME_longitudinal` for more information.
include_data_after_x_L	Boolean indicating whether to include all longitudinal data, including data after the landmark age `x_L`, in the model development dataset. See Details section of `fit_LME_longitudinal` for more information.
cv_name	Character string specifying the column name in `data_long` that indicates cross-validation fold
individual_id	Character string specifying the column name in `data_long` which contains the individual identifiers
lme_control	Object created using `nlme::lmeControl()`, which will be passed to the `control` argument of the `lme` function

Value

List containing elements: data_longitudinal, model_longitudinal, model_LME, and model_LME_standardise_time.

data_longitudinal has one row for each individual in the risk set at x_L and contains the value of the covariates at the landmark time x_L of the predictors_LME using the LOCF model and responses_LME using the LME model.

model_longitudinal indicates that the LME approach is used.

model_LME contains the output from the lme function from package nlme. For a model using cross-validation, model_LME contains a list of outputs with each element in the list corresponds to a different cross-validation fold.

model_LME_standardise_time contains a list of two objects mean_response_time and sd_response_time if the parameter standardise_time=TRUE is used. This is the mean and standard deviation used to normalise times when fitting the LME model.

Details

For an individual $i$, the LME model can be written as

$$Y_i = X_i \beta + Z_i U_i + \epsilon_i$$

where

$Y_i$ is the vector of responses at different time points for the individual
$X_i$ is the matrix of predictors for the fixed effects at these time points
$\beta$ is the vector of coefficients for the fixed effects
$Z_i$ is the matrix of predictors for the random effects
$U_i$ is the matrix of coefficients for the random effects
$\epsilon_i$ is the error term, typically from N(0, $\sigma$)

By using an LME model to fit repeat measures data, rather than a linear model, we can allow measurements from the same individuals to be more similar than measurements from different individuals. This is done through the random intercept and/or random slope.

Extending this model to the case where there are multiple random effects, denoted $k$, we have

$$Y_{ik} = X_{ik} \beta_k + Z_{ik} U_{ik} + \epsilon_{ik}$$

Typically the random effects are assumed to be from the multivariate normal (MVN) distribution $MVN(0,\Sigma_u)$ and we choose a certain covariance structure for $\Sigma_u$. The function fit_LME_landmark uses this distribution with unstructured covariance for the random effects when fitting the LME model (i.e. no constraints are imposed on the values).

To fit the LME model the function lme from the package nlme is used. The random intercept is always included in the LME model. Additionally, the random slope can be included in the LME model using the parameter random_slope_longitudinal=TRUE.

It is important to distinguish between the validation set and the development set for fitting the LME model in this function. The development dataset either includes all the repeat measurements (including those after the landmark age x_L), or only the repeat measurements recorded up to and including the landmark age x_L. This is controlled using the parameter include_data_after_x_L. The validation set only includes the repeat measurements recorded up until and including the landmark age x_L, i.e. it does not include future data in its predictions.

Using the fitted model, the values of the best linear unbiased predictions (BLUPs) at the landmark age x_L are calculated. These BLUPs are the predictions of the values of the responses_LME the landmark age x_L. The values of the predictors in this prediction are the LOCF values of the predictors_LME at the landmark age x_L. In the function fit_LME_landmark, these predictions are used as covariates in the survival model along with the LOCF values of predictors_LME. Additionally, the estimated value of the random slope can be included as predictors in the survival model using the parameter random_slope_survival=TRUE.

There is an important consideration about fitting the linear mixed effects model. As the variable responses_LME_time gets further from 0, the random effects coefficients get closer to 0. This causes computational issues as the elements in the covariance matrix of the random effects, $\Sigma_u$, are constrained to be greater than 0. Using parameter standard_time=TRUE can prevent this issue by standardising the time variables to ensure that the responses_LME_time values are not too close to 0.