我在使用以下示例数据集执行 CoxPH 分析时遇到问题:
structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
= c("Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",
"Targetted Tx", "Targetted Tx", "Targetted/chemo combo", "Targetted Tx", "Targetted Tx", "Targetted Tx"), Time.on.systemic.Tx =
c("2.069815195", "2.332648871", "2.069815195", "1.215605749",
"2.661190965", "0.689938398", "1.839835729", "2.858316222",
"0.657084189", "2.529774127", "1.80698152", "3.482546201",
"2.891170431", "3.515400411", "2.431211499", "3.515400411",
"1.347022587", "5.519507187", "17.47843943", "26.90759754",
"6.176591376", "5.979466119", "8.246406571", "15.40862423",
"5.749486653", "6.242299795", "5.683778234", "6.636550308",
"10.15195072", "10.0862423", "18.52977413", "5.749486653",
"10.7761807", "6.965092402"), PFS2 = c(2.595482546, 2.37, 2.069815195,
1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526,
0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667,
2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532,
7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187,
5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388,
18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 0, 1, 1, 0, 1, 1, 1), Binarised_Time.on.Tx.2 = c("≤ 3.52
months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
months", "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months", "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months", "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months", "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
months", "> 3.52 months")), row.names = c(NA, -34L), class =
"data.frame")
这是我用于此分析的代码:
fit1 <- coxph(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
Test_Dataset)
summary(fit1)
运行此代码后我收到以下警告:
Warning message: In coxph.fit(X, Y, istrat, offset, init, control, weights = weights, : Loglik converged before variable 1 ; coefficient may be infinite.
更重要的是,我收到了不正确的结果,因为置信区间从 0 到 Inf,并且系数和 p 值非常高。我使用相同的数据集运行了总体生存分析,该数据集运行良好,没有任何问题。对于导致我的 PFS2 值出现此问题的原因,您有什么建议吗?
最佳答案
这是完全分离问题的一个变体,您可以开始阅读(例如)here .
这些并不是真正不正确的估计,而是试图显示无限的估计。在这种情况下,Wald 对标准误差的估计失败了(这称为Hauck-Donner 效应)。
一些可能的解决方案:
- 您仍然可以使用
anova.coxph
将拟合度与空模型的拟合度进行比较,并以此方式获取有效的 p 值 - 考虑不要将你的预测变量二分...
- 拟合正则化模型,例如使用带有山脊惩罚 (
alpha = 0
) 和小惩罚的glmnet
包
通过绘制数据最容易看出(使用 Kaplan-Meier 估计):
library(ggfortify)
fit2 <- survfit(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
Test_Dataset)
autoplot(fit2)
“≤3.52”层中的所有个体都死亡(失败)或在其他层中的第一个个体死亡之前被审查......
我们也可以绘制拟合的 Cox 模型(使用 autoplot(survfit(fit))
),尽管不太明显发生了什么......
关于r - 如何解决在 R 中运行 CoxPH 分析时置信区间过大的问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76352151/