Train and test a ZeitZeiger predictor, accounting for batch effects

Train and test a predictor on multiple datasets independently, using sva::ComBat() to correct for batch effects prior to running zeitzeiger(). This function requires the metapredict package.

zeitzeigerBatch(
  ematList,
  trainStudyNames,
  sampleMetadata,
  studyColname,
  batchColname,
  timeColname,
  nKnots = 3,
  nTime = 10,
  useSpc = TRUE,
  sumabsv = 2,
  orth = TRUE,
  nSpc = 2,
  timeRange = seq(0, 1 - 0.01, 0.01),
  covariateName = NA,
  featuresExclude = NULL,
  dopar = TRUE
)

Arguments

ematList	Named list of matrices of measurements, one for each dataset, some of which will be for training, others for testing. Each matrix should have rownames corresponding to sample names and colnames corresponding to feature names.
trainStudyNames	Character vector of names in `ematList` corresponding to datasets for training.
sampleMetadata	data.frame containing relevant information for each sample across all datasets. Must have a column named `sample`.
studyColname	Name of column in `sampleMetdata` that contains information about which dataset each sample belongs to.
batchColname	Name of column in `sampleMetdata` that contains information about which dataset each sample belongs to. This should correspond to the names of `ematList`, and will often be the same as `studyColname`, but doesn't have to be.
timeColname	Name of column in `sampleMetdata` that contains the values of the periodic variable.
nKnots	Number of internal knots to use for the periodic smoothing spline.
nTime	Number of time-points by which to discretize the time-dependent behavior of each feature. Corresponds to the number of rows in the matrix for which the SPCs will be calculated.
useSpc	Logical indicating whether to use `PMA::SPC()` (default) or `base::svd()`.
sumabsv	L1-constraint on the SPCs, passed to `PMA::SPC()`.
orth	Logical indicating whether to require left singular vectors be orthogonal to each other, passed to `PMA::SPC()`.
nSpc	Vector of the number of SPCs to use for prediction. If `NA` (default), `nSpc` will become `1:K`, where `K` is the number of SPCs in `spcResult`. Each value in `nSpc` will correspond to one prediction for each test observation. A value of 2 means that the prediction will be based on the first 2 SPCs.
timeRange	Vector of values of the periodic variable at which to calculate likelihood. The time with the highest likelihood is used as the initial value for the MLE optimizer.
covariateName	Name of column(s) in `sampleMetadata` containing information about other covariates for `sva::ComBat()`, besides `batchColname`. If `NA` (default), then there are no other covariates.
featuresExclude	Named list of character vectors corresponding to features to exclude from being used for prediction for the respective test datasets.
dopar	Logical indicating whether to process the folds in parallel. Use `doParallel::registerDoParallel()` to register the parallel backend.

Value

spcResultList

List of output from zeitzeigerSpc(), one for each test dataset.

timeDepLike

3-D array of likelihood, with dimensions for each test observation (across all datasets), each element of nSpc, and each element of timeRange.

mleFit

List (for each element in nSpc) of lists (for each test observation) of mle2 objects.

timePred

Matrix of predicted times for test observations by values of nSpc.

Train and test a ZeitZeiger predictor, accounting for batch effects

Arguments

Value

See also