r - 如何扩展逻辑回归图?

标签 r ggplot2 dplyr logistic-regression r-caret

Here is the plot I created我在 R 上创建了一个逻辑模型,问题是我的最大 x 值为 0.85,因此绘图停在该值处。

有没有一种方法可以将其扩展为使用我的逻辑模型计算出的 x=100 和 y 值?

library(caret)
library(mlbench)
library(ggplot2)
library(tidyr)
library(caTools)

my_data2 <- read.csv('C:/Users/Magician/Desktop/R files/Fnaticfirstround.csv', header=TRUE, stringsAsFactors = FALSE)

my_data2
#converting Map names to the calculated win probability
my_data2[my_data2$Map == "Dust2", "Map"] <- 0.307692
my_data2[my_data2$Map == "Inferno", "Map"] <- 0.47619
my_data2[my_data2$Map == "Mirage", "Map"] <- 0.708333
my_data2[my_data2$Map == "Nuke", "Map"] <- 0.444444
my_data2[my_data2$Map == "Overpass", "Map"] <- 0.333333
my_data2[my_data2$Map == "Train", "Map"] <- 0.692308
my_data2[my_data2$Map == "Vertigo", "Map"] <- 0
my_data2[my_data2$Map == "Cache", "Map"] <- 0.857143
#converting W and L to 1 and 0
my_data2$WinorLoss <- ifelse(my_data2$WinorLoss == "W", 1,0)
my_data2$WinorLoss <- factor(my_data2$WinorLoss, levels = c(0,1))

#converting Map to numeric characters
my_data2$Map <- as.numeric(my_data2$Map)

#Logistic regression model
glm.fit <- glm(WinorLoss ~ Map, family=binomial, data=my_data2)

summary(glm.fit)
#make predictions on the training data
glm.probs <- predict(glm.fit, type="response")

glm.pred <- ifelse(glm.probs>0.5, 1, 0)

attach(my_data2)
table(glm.pred,WinorLoss)

mean(glm.pred==WinorLoss)

#splitting the data for trying and testing
Split <- sample.split(my_data2, SplitRatio = 0.7)
traindata <- subset(my_data2, Split == "TRUE")
testdata <- subset(my_data2, Split == "FALSE")


glm.fit <- glm(WinorLoss ~ Map, 
               data=traindata, 
               family="binomial")
glm.probs <- predict(glm.fit,
                     newdata=testdata,
                     type="response")
glm.pred <- ifelse(glm.probs > 0.5, "1", "0")

table(glm.pred, testdata$WinorLoss)

mean(glm.pred == testdata$WinorLoss)

summary(glm.fit)

#changing the x axis to 0-100%, min map win prob - max map win prob
newdat <- data.frame(Map = seq(min(traindata$Map), max(traindata$Map), len=100))
newdat$WinorLoss = predict(glm.fit, newdata=newdat, type="response") 


p <- ggplot(newdat, aes(x=Map,y=WinorLoss))+
  geom_point() +
  geom_smooth(method = "glm",
              method.args = list(family="binomial"),
              se = FALSE) +
              xlim(0,1) +
              ylim(0,1)

我尝试将 x 值扩展到 100,但只是扩展了轴,但没有计算相应的 y 值,因此绘制了这些值。

最佳答案

我无法重现您的数据,因此我将使用“挑战者灾难”示例(参见 LINK)展示如何重现,并使用置信区间色带。

您应该在数据中创建人工点并在绘图前对其进行拟合。

下次,尝试使用 reprex 或提供一个最小的可重现示例。

准备数据和模型拟合:

library(dplyr)

fails <- c(2, 0, 0, 1, 0, 0, 1, 0, 0, 1, 2, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0)

temp <- c(53, 66, 68, 70, 75, 78, 57, 67, 69, 70, 75, 79, 58, 67, 70, 72, 76, 80, 63, 67, 70, 73, 76)

challenger <- tibble::tibble(fails, temp)

orings = 6
challenger <- challenger %>%
  dplyr::mutate(resp = fails/orings)

model_fit <- glm(resp ~ temp, 
                 data = challenger, 
                 weights = rep(6, nrow(challenger)),
                 family=binomial(link="logit"))

##### ------- this is what you need: -------------------------------------------

# setting limits for x axis
x_limits <- challenger %>%
  dplyr::summarise(min = 0, max = max(temp)+10)

# creating artificial obs for curve smoothing -- several points between the limits
x <- seq(x_limits[[1]], x_limits[[2]], by=0.5)

# artificial points prediction
# see: https://stackoverflow.com/questions/26694931/how-to-plot-logit-and-probit-in-ggplot2
temp.data = data.frame(temp = x) #column name must be equal to the variable name

# Predict the fitted values given the model and hypothetical data
predicted.data <- as.data.frame(
  predict(model_fit, 
          newdata = temp.data, 
          type="link", se=TRUE)
  )

# Combine the hypothetical data and predicted values
new.data <- cbind(temp.data, predicted.data)
##### --------------------------------------------------------------------------

# Compute confidence intervals
std <- qnorm(0.95 / 2 + 0.5)
new.data$ymin <- model_fit$family$linkinv(new.data$fit - std * new.data$se)
new.data$ymax <- model_fit$family$linkinv(new.data$fit + std * new.data$se)
new.data$fit <- model_fit$family$linkinv(new.data$fit)  # Rescale to 0-1

绘图:


library(ggplot2)

plotly_palette <- c('#1F77B4', '#FF7F0E', '#2CA02C', '#D62728')

p <- ggplot(challenger, aes(x=temp, y=resp))+ 
  geom_point(colour = plotly_palette[1])+ 
  geom_ribbon(data=new.data, 
              aes(y=fit, ymin=ymin, ymax=ymax), 
              alpha = 0.5, 
              fill = '#FFF0F5')+
  geom_line(data=new.data, aes(y=fit), colour = plotly_palette[2]) + 
  labs(x="Temperature", y="Estimated Fail Probability")+
  ggtitle("Predicted Probabilities for fail/orings with 95% Confidence Interval")+
  theme_bw()+
  theme(panel.border = element_blank(), plot.title = element_text(hjust=0.5))

p

# if you want something fancier:
# library(plotly)
# ggplotly(p)

结果:

enter image description here

关于挑战者数据的有趣事实:

NASA 工程师使用线性回归来估计 O 形环失效的可能性。如果他们对他们的数据使用更合适的技术,例如逻辑回归,他们会注意到在较低温度(例如发射时约 36F)下失败的可能性非常高。该图向我们显示,对于 ~36F(我们从观察到的温度推断的温度),我们有 ~0.75 的概率。如果我们考虑置信区间......好吧,事故几乎是肯定的。

关于r - 如何扩展逻辑回归图?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58026288/

相关文章:

r - 在 geom_abline 中跨面使用不同的斜率

r - 在 R 的 full_join 中添加数据集标识符变量

rename_with 但谓词基于其他变量中的值

使用 NetCDF 的 R CMD SHLIB Fortran 90 文件

r - plot_ly mesh3d 颜色无法正常工作

r - 从strsplit列表中提取向量,而无需使用循环

r - 如何获取经纬度范围内的 map ?

r - 控制ggplot2图例显示顺序

r - 单个 block 内可变的绘图高度

r - 如何在 R 中使用 %.% 运算符(编辑 : operator deprecated in 2014)