Title: | Reconstruct Individual-Level Data from Published KM Plots |
---|---|
Description: | Functions for reconstructing individual-level data (time, status, arm) from Kaplan-MEIER curves published in academic journals (e.g. NEJM, JCO, JAMA). The individual-level data can be used for re-analysis, meta-analysis, methodology development, etc. This package was used to generate the data for commentary such as Sun, Rich, & Wei (2018) <doi:10.1056/NEJMc1808567>. Please see the vignette for a quickstart guide. |
Authors: | Ryan Sun [aut, cre] |
Maintainer: | Ryan Sun <[email protected]> |
License: | GPL-3 |
Version: | 0.3.0 |
Built: | 2024-12-08 04:32:39 UTC |
Source: | https://github.com/cran/reconstructKM |
When there are more clicks in the composite (overall) outcome curve, we need to add them to the subdistribution curves. Find the time points in the composite data that are furthest away from the times in clicksDF, add these times to clicksDF with 0 jumps in cuminc.
add_clicks(clicksDF, targetTimes, nAdd)
add_clicks(clicksDF, targetTimes, nAdd)
clicksDF |
A data frame with the two columns, time and cuminc. |
targetTimes |
A vector of times from the composite KM plot. |
nAdd |
Number of times to add to clicksDF. |
An augmented clicksDF with extra rows (no cuminc jumps in those extra times).
clicksDF <- data.frame(time=0:10, cuminc=seq(from=0, to=1, by=0.1)) add_clicks(clicksDF, targetTimes = runif(n=14, min=0, max=10), nAdd=5)
clicksDF <- data.frame(time=0:10, cuminc=seq(from=0, to=1, by=0.1)) add_clicks(clicksDF, targetTimes = runif(n=14, min=0, max=10), nAdd=5)
In competing risks situations, papers may provide one overall KM plot for the composite outcome of event 1 or event 2 as well as cumulative incidence plots for the each event separately. We can use these three plots to reconstruct individual level data with event-specific labels (censored, event 1, or event 2). Can also handle the case when the CIC for event 2 is not given. Run this separately for each arm.
CIC_reconstruct(overallIPD, clicks1, arm, clicks2 = NULL)
CIC_reconstruct(overallIPD, clicks1, arm, clicks2 = NULL)
overallIPD |
The individual patient data from the overall (composite outcome) plot that has already been processed through reconstructKM. Should have three columns: time, status, and arm. |
clicks1 |
A data.frame with "time" and "cuminc" columns that are output from the digitizing software, similar to what you would input for reconstructKM except it's a cumulative incidence function for a specific event, not a survival function (make sure first click is (0,0)). |
arm |
The arm corresponding to clicks1 and possibly clicks2. |
clicks2 |
Same as clicks1 but for the second event if it's provided. Default is null. |
An augmented version of overallIPD that additionally gives the cause of the event (cause 1 or cause 2) as a fourth "event" column.
data(pembro_clicks) data(pembro_NAR) augTabs <- format_raw_tabs(raw_NAR=pembro_NAR, raw_surv=pembro_clicks) reconstruct <- KM_reconstruct(aug_NAR=augTabs$aug_NAR, aug_surv=augTabs$aug_surv) IPD <- data.frame(arm=1, time=reconstruct$IPD_time, status=reconstruct$IPD_event) clicks1 <- dplyr::mutate(pembro_clicks, cuminc=1-survival) CIC_reconstruct(overallIPD = IPD, clicks1 = clicks1, arm=1, clicks2=NULL)
data(pembro_clicks) data(pembro_NAR) augTabs <- format_raw_tabs(raw_NAR=pembro_NAR, raw_surv=pembro_clicks) reconstruct <- KM_reconstruct(aug_NAR=augTabs$aug_NAR, aug_surv=augTabs$aug_surv) IPD <- data.frame(arm=1, time=reconstruct$IPD_time, status=reconstruct$IPD_event) clicks1 <- dplyr::mutate(pembro_clicks, cuminc=1-survival) CIC_reconstruct(overallIPD = IPD, clicks1 = clicks1, arm=1, clicks2=NULL)
Augment a raw number at risk table with the necessary information to run the reconstruction algorithm.
format_raw_tabs(raw_NAR, raw_surv, tau = NULL)
format_raw_tabs(raw_NAR, raw_surv, tau = NULL)
raw_NAR |
A data frame with the columns 'time' and NAR' at least. |
raw_surv |
A data frame with the columns 'time' and 'survival' at least. |
tau |
End of follow-up time, defaults to last time in NAR table. |
A list with aug_NAR and aug_surv, properly cleaned tables that can be used as input in KM_reconstruct().
data(pembro_clicks) data(pembro_NAR) augTabs <- format_raw_tabs(raw_NAR=pembro_NAR, raw_surv=pembro_clicks)
data(pembro_clicks) data(pembro_NAR) augTabs <- format_raw_tabs(raw_NAR=pembro_NAR, raw_surv=pembro_clicks)
Calculate nonparametric RMST for a single arm up to tau for data.frame with time and status
integrate_survdat(dat, tau, alpha = 0.05)
integrate_survdat(dat, tau, alpha = 0.05)
dat |
Data frame of time-to-event data which MUST have the columns 'time' and 'status' exactly |
tau |
The cutoff time, a scalar |
alpha |
Level for confidence interval |
data.frame with rows for RMST and RMTL and columnns for estimate, std err, pvalue, and CI
time <- rnorm(100) status <- rbinom(n=100, size=1, prob=0.5) dat <- data.frame(time=time, status=status) integrate_survdat(dat=dat, tau=2)
time <- rnorm(100) status <- rbinom(n=100, size=1, prob=0.5) dat <- data.frame(time=time, status=status) integrate_survdat(dat=dat, tau=2)
Reconstruct individual-level data from augmented survival table and augmented NAR table, with augmentation performed by format_raw_tabs().
KM_reconstruct(aug_NAR, aug_surv)
KM_reconstruct(aug_NAR, aug_surv)
aug_NAR |
A data frame processed through format_raw_tabs(). |
aug_surv |
A data frame processed through format_raw_tabs(). |
A list including IPD_time, IPD_event, n_hat=n_hat, KM_hat, n_cen, n_event, int_censor
data(pembro_NAR) data(pembro_clicks) augTabs <- format_raw_tabs(raw_NAR=pembro_NAR, raw_surv=pembro_clicks) KM_reconstruct(aug_NAR=augTabs$aug_NAR, aug_surv=augTabs$aug_surv)
data(pembro_NAR) data(pembro_clicks) augTabs <- format_raw_tabs(raw_NAR=pembro_NAR, raw_surv=pembro_clicks) KM_reconstruct(aug_NAR=augTabs$aug_NAR, aug_surv=augTabs$aug_surv)
Non-parametric RMST function that allows for the tau (follow-up time) to be arbitrarily large. Uno package restricts it to be min(last observed event in either arm). Provides estimate, SE, CI for each arm. Provides same for difference in arms (and also p-value).
nonparam_rmst(dat, tau, alpha = 0.05)
nonparam_rmst(dat, tau, alpha = 0.05)
dat |
Data frame of time-to-event data which MUST have the columns 'time', 'arm', and 'status |
tau |
How long of a follow-up to consider, i.e. we integrate the survival functions from 0 to tau |
alpha |
Confidence interval is given for (alpha/2, 1-alpha/2) percentiles |
A list including data.frame of results in each arm (RMST, RMTL, SE, pvalue, CI) as well as data.frame of results for Arm1 - Arm0 RMST.
time <- rnorm(100) status <- rbinom(n=100, size=1, prob=0.5) arm <- c( rep(1, 50), rep(0, 50)) dat <- data.frame(time=time, status=status, arm=arm) nonparam_rmst(dat=dat, tau=1, alpha=0.05)
time <- rnorm(100) status <- rbinom(n=100, size=1, prob=0.5) arm <- c( rep(1, 50), rep(0, 50)) dat <- data.frame(time=time, status=status, arm=arm) nonparam_rmst(dat=dat, tau=1, alpha=0.05)
A dataset containing the clicks used to reconstruct the placebo OS KM curve.
data(pbo_clicks)
data(pbo_clicks)
A data frame with 96 rows and 2 variables, time (event time in months) and survival (probability of OS)
Gandhi et al. NEJM 2018;378(22):2078-2092
A dataset containing the number at risk information for the placebo OS KM curve.
data(pbo_NAR)
data(pbo_NAR)
A data frame with 8 rows and 2 variables, time (time in months) and NAR (number still at risk)
Gandhi et al. NEJM 2018;378(22):2078-2092
A dataset containing the clicks used to reconstruct the pembrolizumab OS KM curve.
data(pembro_clicks)
data(pembro_clicks)
A data frame with 97 rows and 2 variables, time (event time in months) and survival (probability of OS)
Gandhi et al. NEJM 2018;378(22):2078-2092
A dataset containing the number at risk information for the pembrolizumab OS KM curve.
data(pembro_NAR)
data(pembro_NAR)
A data frame with 8 rows and 2 variables, time (time in months) and NAR (number still at risk)
Gandhi et al. NEJM 2018;378(22):2078-2092
Just a wrapper to get quantities out of a call to coxph()
print_cox_outputs(cox_fit, print_output = TRUE)
print_cox_outputs(cox_fit, print_output = TRUE)
cox_fit |
A model fitted with coxph() |
print_output |
Print summary to screen if TRUE |
A list including beta, HR, SE, and CI
time <- rnorm(100) status <- rbinom(n=100, prob=0.5, size=1) arm <- c(rep(1,50), rep(0,50)) temp_cox <- survival::coxph(survival::Surv(time, status) ~ arm) print_cox_outputs(temp_cox)
time <- rnorm(100) status <- rbinom(n=100, prob=0.5, size=1) arm <- c(rep(1,50), rep(0,50)) temp_cox <- survival::coxph(survival::Surv(time, status) ~ arm) print_cox_outputs(temp_cox)
When there are fewer clicks in the composite (overall) outcome curve, we need to remove them from the subdistribution curves. Find the time points in the subdistribution data that are furthest away from the composite curve times, remove those times.
remove_clicks(clicksDF, targetTimes, nRemove)
remove_clicks(clicksDF, targetTimes, nRemove)
clicksDF |
A data frame with the two columns time and cuminc. |
targetTimes |
A vector of times from the composite KM plot. |
nRemove |
Number of times to remove from clicksDF. |
A clicksDF with fewer rows.
clicksDF <- data.frame(time=0:10, cuminc=seq(from=0, to=1, by=0.1)) remove_clicks(clicksDF, targetTimes = runif(n=7, min=0, max=10), nRemove=3)
clicksDF <- data.frame(time=0:10, cuminc=seq(from=0, to=1, by=0.1)) remove_clicks(clicksDF, targetTimes = runif(n=7, min=0, max=10), nRemove=3)
RMST for time-to-event data under parametric Weibull fit for data in each arm separately. Also can provide CI for RMST estimate and difference in RMST.
weibull_rmst(num_boots = 1000, dat, tau, alpha, find_pval = FALSE, seed = NULL)
weibull_rmst(num_boots = 1000, dat, tau, alpha, find_pval = FALSE, seed = NULL)
num_boots |
Number of bootstrap iterations |
dat |
Data frame of time-to-event data which MUST have the columns 'time', 'arm', and 'status |
tau |
How long of a follow-up to consider, i.e. we integrate the survival functions from 0 to tau |
alpha |
Confidence interval is given for (alpha/2, 1-alpha/2) percentiles |
find_pval |
Boolean, if TRUE then does bootstrap under the null to find p-value of mean difference and RMST difference |
seed |
For reproducibility |
A list including out_tab (estimate and CI in both arms), trt_rmst, pbo_rmst, diff_rmst, trt_CI, pbo_CI, diff_CI. Assumes trt coded as arm 1 and placebo coded as arm 0.
time <- rexp(100) status <- rbinom(n=100, prob=0.5, size=1) arm <- c( rep(1, 50), rep(0, 50)) dat <- data.frame(time=time, status=status, arm=arm) weibull_rmst(dat=dat, tau=1, alpha=0.05, num_boots=200)
time <- rexp(100) status <- rbinom(n=100, prob=0.5, size=1) arm <- c( rep(1, 50), rep(0, 50)) dat <- data.frame(time=time, status=status, arm=arm) weibull_rmst(dat=dat, tau=1, alpha=0.05, num_boots=200)
Fit the shape and scale parameters for a Weibull distribution to the time-to-event data using MLE.
weimle1(time, status)
weimle1(time, status)
time |
A vector of event times |
status |
A vector of 0-1 censoring status, 0 for censored, 1 for observed |
A list including out (the return from mle()), shape, and scale
time <- rexp(100) status <- rbinom(n=100, size=1, prob=0.5) weimle1(time=time, status=status)
time <- rexp(100) status <- rbinom(n=100, size=1, prob=0.5) weimle1(time=time, status=status)