我正在尝试创建一个函数,当给定数据框和列时,它使用 Rosner 的测试 (EnvStats::rosnerTest) 来识别异常值并返回新的数据框,以便我可以检查每个异常值。
我可以在不使用函数的情况下实现此目的,但因为我有一个包含许多变量的数据框,所以我想创建一个函数来更快地自动执行此操作。 ( My previous post 显示一次执行此一个变量的工作流程。)
这是我的数据:
> dput(head(data))
structure(list(cap_date = structure(c(4856, 4860, 4860, 4861,
4866, 4867), class = "Date"), cap_year = c(1983L, 1983L, 1983L,
1983L, 1983L, 1983L), age_class = c("A", "S", "S", "A", "A", "A"), sex =
c("F", "F", "F", "F", "F", "F"), alt = c(11, 12, 15.67000008, 7, 14.5,
17.5), alb = c(2.599999905, 5.369999886, 4.670000076, 4.429999828, 3.75,
3.700000048), alp = c(9, 86.33000183, 28, 170.6699982, 12, 82.5),
tbil = c(0.200000003, 1.070000052, 0.430000007, 1.169999957,
0.300000012, 0.400000006), bun = c(20, 17, 11.32999992, 56.33000183,
7.5, 45), calcium = c(NA, 8.930000305, 8.800000191, 8.970000267, NA,
7.550000191), crea = c(0.5, 0.569999993, 0.529999971, 0.600000024,
1.049999952, 0.75), phos = c(2.75, 4.099999905, 4.96999979,
5.329999924, 4.099999905, 7.400000095), pot = c(5.550000191,
6.730000019, 3.869999886, 4.269999981, 3.049999952, 6.849999905), tp
= c(4.449999809, 6.769999981, 5.800000191, 6.769999981, 5.75,
6.400000095), sodium = c(NA, 142, 127, 138.3300018, 164, 139), glob =
c(1.849999905, 1.400000095, 1.130000114, 2.340000153, 2,
2.700000048), cortisol = c(4.24, 7.2231, 4.5431, NA, 6.0874, 4.8727),
row = c(1L, 2L, 3L, 4L, 6L, 7L)), row.names = c(1L, 2L, 3L, 4L, 6L,
7L), class = "data.frame")
这是我的代码:
library("EnvStats")
library("dplyr")
detect.outlier <- function(df, i, k) { # i is a column/variable, and k is an input in the Rosner test
plot(df$year, df[[i]], xlab = "Year", ylab = "Value") # I also want to print the plot
ros.test <- rosnerTest(df[[i]], k)
ros.results <- ros.test$all.stats
ros.outliers <- ros.results %>% filter(Outlier) %>% select(Obs.Num) # filter by outlier = TRUE ; Obs.Num corresponds with row number in my data frame
ros.outliers <- ros.outliers[,1] # change from a data frame to a vector
outlier_df <- df[df$row %in% ros.outliers,]
return(outlier_df %>% select(age_class, sex, i))
}
我尝试运行该函数:
detect.outlier(data, alt, 20)
但是我收到一个错误:
Error during wrapup: recursive indexing failed at level 2
Error: no more error handlers available (recursive errors?); invoking 'abort' restart
我不确定这意味着什么或如何解决它 - 任何帮助将不胜感激。非常感谢!
编辑:有时当我运行该函数时,我也会收到此错误:
Error in rosnerTest(data$variable, k) : 'x' must be a numeric vector
这看起来很奇怪,因为当我执行 class(data$alt) 时,它说它是数字。
编辑:Yama的解决方案是正确的。 我正在检查代码以确保它返回正确的异常值,并且 Rosner 测试似乎返回与行号不同的“Obs.Num”。以下是使用函数解构代码的示例:
> ros.test <- rosnerTest(df$crea, k = 10)
Warning message:
In rosnerTest(df$crea, k = 10) :
3 observations with NA/NaN/Inf in 'x' removed.
> ros.results <- ros.test$all.stats
> print(ros.results)
i Mean.i SD.i Value Obs.Num R.i+1 lambda.i+1 Outlier
1 0 1.078450 0.3102488 3.35 222 7.321705 4.053146 TRUE
2 1 1.076295 0.3023919 2.55 12 4.873495 4.052913 TRUE
3 2 1.074895 0.2991009 2.35 1047 4.263125 4.052680 TRUE
4 3 1.073683 0.2966446 2.30 877 4.133960 4.052447 TRUE
5 4 1.072516 0.2943607 2.10 801 3.490560 4.052214 FALSE
6 5 1.071538 0.2927857 2.00 293 3.171133 4.051980 FALSE
7 6 1.070653 0.2915166 2.00 373 3.187974 4.051746 FALSE
8 7 1.069766 0.2902367 1.95 633 3.032814 4.051512 FALSE
9 8 1.068925 0.2890959 1.90 103 2.874737 4.051278 FALSE
10 9 1.068131 0.2880883 1.85 548 2.713992 4.051043 FALSE
> # 4 outliers flagged - obs. num 222, 12, 1047, and 877
> crea <- df[c(222, 12, 1047, 877),]
> crea %>% select(age_class, sex, crea, row)
age_class sex crea row
236 A F 3.35 236
13 A M 2.55 13
1154 A M 2.35 1154
969 A M 2.30 969
这里我们看到行号从 222 更改为 236、从 12 更改为 13、从 1047 更改为 1154、从 877 更改为 969
这最终会对我的函数行产生影响
outlier_df <- df[df$row %in% ros.outliers,]
因为它随后索引了错误的行号。
非常感谢任何帮助!!
最佳答案
您的函数会查找您指定的变量i
。当您使用 detect.outlier(data, alt, 20)
调用函数时,i
的值为 alt
。因此,在函数 detect.outlier()
中执行的代码是 plot(df$year, df[[alt]], xlab = "Year", ylab = "Value")
应该是 plot(df$year, df[["alt"]], xlab = "Year", ylab = "Value")
。
您可以通过编写 detect.outlier(df, "alt", 20)
来更正该问题。
您的代码中显然还有另一个问题:
Error in xy.coords(x, y, xlabel, ylabel, log) :
'x' and 'y' lengths differ
但这应该已经对你有帮助了。
编辑:您应该提供 rosnerTest 函数的包名称。
关于r - 创建函数时出错: 'recursive indexing failed' ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/76048208/