r - 从通过散点图拟合的回归线中排除异常值，而不从图中移除异常值

我有如下数据，我在下面运行 ggplot 代码:

data <- structure(list(country_mean_rep = structure(c(73.6995708154506, 
93.5501285347044, 85.1529051987768, 91.1017369727047, 79.5562130177515, 
84.6751054852321, 89.8, 86.8826405867971, 94.2247191011236, 70.2321428571429, 
88.4107142857143), label = "label", format.stata = "%9.2f"), 
    country_mean_crime = c(0.0944206008583691, 0.0565552699228792, 
    0.0336391437308868, 0.205955334987593, 0.130177514792899, 
    0.282700421940928, 0.220512820512821, 0.415647921760391, 
    0.387640449438202, 0.200892857142857, 0.292207792207792), 
    country_name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 11L, 12L, 
    14L, 16L, 20L), .Label = c("Albania", "Armenia", "Azerbaijan", 
    "Belarus", "Bosnia and Herzegovina", "Brazil", "Bulgaria", 
    "Cambodia", "Chile", "CostaRica", "Croatia", "Czech", "Ecuador", 
    "Estonia", "FYROM", "Georgia", "Germany", "Greece", "Guyana", 
    "Hungary", "Ireland", "Kazakhstan", "Kenya", "Kyrgyzstan", 
    "Latvia", "Lithuania", "Malawi", "Mali", "Moldova", "Philippines", 
    "Poland", "Portugal", "Romania", "Russia", "Senegal", "Serbia&Montenegro", 
    "Slovakia", "Slovenia", "South Africa", "South Korea", "Spain", 
    "SriLanka", "Tajikistan", "Turkey", "Ukraine", "Uzbekistan", 
    "Vietnam"), class = "factor")), row.names = c(NA, -11L), class = c("data.table", 
"data.frame"))

# On which I like to run the following code:

ggplot(data, aes(x=country_mean_rep, y=country_mean_crime)) + 
  geom_point() + 
  geom_smooth(aes(colour="linear", fill="linear"), 
              method="lm", 
              formula=y ~ x, ) + 
  geom_smooth(aes(colour="quadratic", fill="quadratic"), 
              method="lm", 
              formula=y ~ x + I(x^2)) + 
  geom_smooth(aes(colour="cubic", fill="cubic"), 
              method="lm", 
              formula=y ~ x + I(x^2) + I(x^3)) + 
  labs(colour="Functional Form", fill="Functional Form") +
  geom_text(aes(label=country_name), nudge_y=0.02) +
  theme_bw()

现在假设捷克共和国是一个离群值，我想将其移除以进行拟合(尤其是线性拟合)。请注意，我知道示例中的捷克共和国没有任何问题，我需要知道这一点才能在我的实际数据中找到适当的异常值。

是否有某种方法可以仅将其从拟合中排除，同时将点保留在图中？

最佳答案

一种方法是包含不同的数据图:

ggplot(subset(data, country_name != 'Czech'), aes(x=country_mean_rep, y=country_mean_crime)) + 
  geom_smooth(aes(colour="linear", fill="linear"), 
              method="lm", 
              formula=y ~ x, ) + 
  geom_smooth(aes(colour="quadratic", fill="quadratic"), 
              method="lm", 
              formula=y ~ x + I(x^2)) + 
  geom_smooth(aes(colour="cubic", fill="cubic"), 
              method="lm", 
              formula=y ~ x + I(x^2) + I(x^3)) + 
  labs(colour="Functional Form", fill="Functional Form") +
  geom_point(data = data, inherit.aes = FALSE, aes(x = country_mean_rep, y = country_mean_crime)) +
  geom_text(data = data, aes(label=country_name, x = country_mean_rep, y = country_mean_crime), inherit.aes = FALSE, nudge_y=0.02) +
  theme_bw()

在这种情况下，3 个线性模型使用子集数据，而对 geom_point 和 geom_text 的调用不继承原始美学。

关于r - 从通过散点图拟合的回归线中排除异常值，而不从图中移除异常值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68396603/

r - 从通过散点图拟合的回归线中排除异常值，而不从图中移除异常值

上一篇：python - 如何将二进制数的字符串转换为二进制数？

下一篇：c# - 根据值更改 DataGridView 列的颜色