We consider two independent censored samples of sizes n1 and n2, one sample with survival function S1(t) and cumulative hazard Λ1(t), the other sample with survival function S2(t) and cumulative hazard Λ2(t). The censoring distribution need not be the same in both groups.
Consider the hypothesis H0:S1(t)=S2(t)∀t>0,H1:S1(t)≠S2(t) for some t>0,⟺H0:Λ1(t)=Λ2(t)∀t>0,H1:Λ1(t)≠Λ2(t) for some t>0.
The class of weighted logrank test statistics for testing H0 against H1 is WK=∫∞0K(s)d(ˆΛ1−ˆΛ2)(s), where K(s)=√n1n2n1+n2¯Y1(s)n1¯Y2(s)n2n1+n2¯Y(s)W(s) and W(s) is a weight function – it governs the power of the test against different alternatives.
W(s)=1 is the logrank test. It is most powerful against proportional hazards alternatives, i.e., λ1(t)=cλ2(t) for some positive constant c≠1.
W(s)=¯Y(s)/(n+1) is the Gehan-Wilcoxon test which in the uncensored case is equivalent to two-sample Wilcoxon test. This test puts more weight on early differences in hazard functions than on differences that occur later.
W(s)=ˆS(s−) is the Prentice-Wilcoxon test (also known as Peto-Prentice or Peto and Peto test). It is especially powerful against alternatives with strong early effects that weaken over time. Preferred compared to Gehan-Wilcoxon test.
W(s)=[ˆS(s−)]ρ[1−ˆS(s−)]γ is the G(ρ,γ) class of tests of Fleming-Harrington. The logrank test is a special case for ρ=γ=0, the Prentice-Wilcoxon test is a special case for ρ=1, γ=0. The G(0,1) test is especially powerful against alternatives with little initial difference in hazards that gets stronger over time.
Let ˆσ2K be the estimated variance of WK under H0, which is of the form ˆσ2K=∞∫0K2(s)¯Y1(s)¯Y2(s)(1−Δ¯N(s)−1¯Y(s)−1)d¯N(s). It can be shown that, under H0, WKˆσKD⟶N(0,1) and W2Kˆσ2KD⟶χ21.
Let t1<t2<⋯<td be the ordered distinct failure times in both samples. The weighted logrank test statistic (without the normalizing constant depending on n1 and n2) can be written as d∑j=1(Δ¯N1(tj)−Δ¯N(tj)¯Y1(tj)¯Y(tj))W(tj). The variance estimator ˆσ2K (again without the normalizing constant depending on n1 and n2) can be calculated using the following formula d∑j=1ΔN(tj)⋅¯Y1(tj)¯Y2(tj)(¯Y(tj))2⋅¯Y(tj)−ΔN(tj)¯Y(tj)−1⋅(W(tj))2.
Download the dataset km_all.RData.
The dataframe inside is called all
. It includes 101 observations and three variables. The observations are acute lymphatic leukemia [ALL] patients who had undergone bone marrow transplant. The variable time
contains time (in months) since transplantation to either death/relapse or end of follow up, whichever occured first. The outcome of interest is time to death or relapse of ALL (relapse-free survival). The variable delta
includes the event indicator (1 = death or relapse, 0 = censoring). The variable type
distinguishes two different types of transplant (1 = allogeneic, 2 = autologous).
Using ordinary R functions (not the survival
library), calculate and print the following table:
j | tj | d1j | dj | y1j | yj | ej=djy1jyj | d1j−ej | vj=djy1j(yj−y1j)(yj−dj)y2j(yj−1) |
---|---|---|---|---|---|---|---|---|
1 | … | … | … | … | … | … | … | … |
… | … | … | … | … | … | … | … | … |
d | … | … | … | … | … | … | … | … |
where j is the order of the failures, tj is the ordered failure time, d1j=Δ¯N1(tj), dj=Δ¯N(tj), y1j=¯Y1(tj), yj=¯Y(tj).
Use these values to perform logrank test on your own.
BONUS: Try to implement G(ρ,γ) test, remember, you need left-continuous version of Kaplan-Meier estimator of survival function.
The function survdiff
in the library survival
can calculate G(ρ,0) statistics. The default is ρ=0, that is, the logrank test. Output is a list with:
$chisq
- test statistic in the squared form W2K/ˆσ2K,$var[1,1]
- variance estimator ˆσ2K,pchisq(...$chisq, df=1, lower.tail = F)
.library(survival)
survdiff(Surv(time,delta)~type,data=all)
survdiff(Surv(time,delta)~type,data=all,rho=2)
## Call:
## survdiff(formula = Surv(time, delta) ~ type, data = all)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## type=1 50 22 24.2 0.195 0.382
## type=2 51 28 25.8 0.182 0.382
##
## Chisq= 0.4 on 1 degrees of freedom, p= 0.5
The function FHtestrcc
in the library FHtest
can calculate G(ρ,γ) statistics for non-zero γ. The default is ρ=0, γ=0, that is, the logrank test. The parameter γ is entered as the argument lambda
. For the proper choice of ρ and γ read Details of help(FHtestrcc)
. Output is a list with:
$statistic
- test statistic in the form WK/ˆσK,$var
- variance estimator ˆσ2K,$pvalue
- pvalue of the test associated with the selected alternative hypothesis (parameter alternative
),library(FHtest)
FHtestrcc(Surv(time,delta)~type, data=all)
FHtestrcc(Surv(time,delta)~type, data=all, rho=0.5, lambda=2)
##
## Two-sample test for right-censored data
##
## Parameters: rho=0.5, lambda=2
## Distribution: counting process approach
##
## Data: Surv(time, delta) by type
##
## N Observed Expected O-E (O-E)^2/E (O-E)^2/V
## type=1 50 0.961 1.69 -0.732 0.316 5.63
## type=2 51 2.369 1.64 0.732 0.327 5.63
##
## Statistic Z= 2.4, p-value= 0.0176
## Alternative hypothesis: survival functions not equal
Caution
The library FHtest
requires library Icens
which has been removed from R and cannot be installed by standard methods. It is available from the Bioconductor repository. It can be downloaded manually and installed by a command such as:
install.packages("http://www.karlin.mff.cuni.cz/~vavraj/cda/data/Icens_1.38.0.zip", repos =NULL)
survdiff
and FHtestrcc
output)
Compare your calculation of logrank test with the output of the functions survdiff
and FHtestrcc
in the case of dataset all
.
In the all
data, investigate the difference in survival and hazard functions according to the type of transplant. Remember that the outcome is relapse-free survival.
Calculate and plot estimated survival functions by the type of transplant (1 = allogeneic, 2 = autologous). Distinguish the groups by color. Add a legend.
Calculate and plot Nelson-Aalen estimators of cumulative hazard functions by the type of transplant (1 = allogeneic, 2 = autologous). Distinguish the groups by color. Add a legend.
Calculate and plot smoothed hazard functions by the type of transplant (1 = allogeneic, 2 = autologous). Distinguish the groups by color. Add a legend.
Perform logrank, Prentice-Wilcoxon and G(0,1) tests using functions survdiff
or FHtestrcc
. Interpret the results. Which test is more suitable in this case?
Hint for obtaining smoothed hazard functions:
library(muhaz)
haz = muhaz(all$time,all$delta)
plot(haz, col = "black", lwd = 2)