r - 如何解决在 R 中运行 CoxPH 分析时置信区间过大的问题?

标签 r cox

我在使用以下示例数据集执行 CoxPH 分析时遇到问题:

structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
 = c("Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx", "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted Tx", "Targetted Tx",
 "Targetted Tx",  "Targetted Tx", "Targetted/chemo combo", "Targetted Tx", "Targetted Tx",  "Targetted Tx"), Time.on.systemic.Tx =
 c("2.069815195", "2.332648871",  "2.069815195", "1.215605749",
 "2.661190965", "0.689938398", "1.839835729",  "2.858316222",
 "0.657084189", "2.529774127", "1.80698152", "3.482546201", 
 "2.891170431", "3.515400411", "2.431211499", "3.515400411",
 "1.347022587",  "5.519507187", "17.47843943", "26.90759754",
 "6.176591376", "5.979466119",  "8.246406571", "15.40862423",
 "5.749486653", "6.242299795", "5.683778234",  "6.636550308",
 "10.15195072", "10.0862423", "18.52977413", "5.749486653", 
 "10.7761807", "6.965092402"), PFS2 = c(2.595482546, 2.37, 2.069815195, 
1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526, 
 0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667, 
 2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532, 
 7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187, 
 5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388, 
 18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,  1, 1,
 1, 1, 1, 0, 1, 1, 0, 1, 1, 1), Binarised_Time.on.Tx.2 = c("≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months")), row.names = c(NA, -34L), class =
 "data.frame")

这是我用于此分析的代码:

fit1 <- coxph(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
summary(fit1)

运行此代码后我收到以下警告:

Warning message: In coxph.fit(X, Y, istrat, offset, init, control, weights = weights, : Loglik converged before variable 1 ; coefficient may be infinite.

更重要的是,我收到了不正确的结果,因为置信区间从 0 到 Inf,并且系数和 p 值非常高。我使用相同的数据集运行了总体生存分析,该数据集运行良好,没有任何问题。对于导致我的 PFS2 值出现此问题的原因,您有什么建议吗?

最佳答案

这是完全分离问题的一个变体,您可以开始阅读(例如)here .

这些并不是真正不正确的估计,而是试图显示无限的估计。在这种情况下,Wald 对标准误差的估计失败了(这称为Hauck-Donner 效应)。

一些可能的解决方案:

  • 您仍然可以使用 anova.coxph 将拟合度与空模型的拟合度进行比较,并以此方式获取有效的 p 值
  • 考虑不要将你的预测变量二分...
  • 拟合正则化模型,例如使用带有山脊惩罚 (alpha = 0) 和小惩罚的 glmnet

通过绘制数据最容易看出(使用 Kaplan-Meier 估计):

library(ggfortify)
fit2 <- survfit(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
autoplot(fit2)

enter image description here

“≤3.52”层中的所有个体都死亡(失败)或在其他层中的第一个个体死亡之前被审查......

我们也可以绘制拟合的 Cox 模型(使用 autoplot(survfit(fit))),尽管不太明显发生了什么......

enter image description here

关于r - 如何解决在 R 中运行 CoxPH 分析时置信区间过大的问题?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76352151/

相关文章:

r - 如何填充ggplot图中两条曲线之间的间隙

r - 使用R作为游戏模拟器

r - 具有逆倾向处理权重的 Cox 回归

r - coxph 命令中时间变换函数的默认值是多少?

Xgboost cox 生存时间输入

使用 'by' 和条件替换 data.table 中的行值

r - grepl:在不包含模式的字符串中搜索

r - 使用 group_by 过滤特定案例,同时保留 NA