r - 结合过滤器、跨和 starts_with 以跨 R 中的列进行字符串搜索

这与 the answer given here 非常相似，但我不明白为什么 starts_with 不起作用:

diamonds %>% 
    filter(across(clarity, ~ grepl('^S', .))) %>% 
    head

# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
4  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
5  0.3  Good      J     SI1      64      55   339  4.25  4.28  2.73
6  0.22 Premium   F     SI1      60.4    61   342  3.88  3.84  2.33

diamonds %>%
  filter(across(starts_with("c"),~grepl("^S" ,.))) %>% 
  head

# A tibble: 0 x 10
# ... with 10 variables: carat <dbl>, cut <ord>, color <ord>, clarity <ord>, depth <dbl>, table <dbl>,
#   price <int>, x <dbl>, y <dbl>, z <dbl>

最佳答案

1.0.4之前的dplyr

diamonds %>%
  filter(rowSums(across(starts_with("c"),~grepl("^S" ,.))) > 0) 
# A tibble: 22,259 x 10
#    carat cut       color clarity depth table price     x     y     z
#    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#  3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#  4  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#  5  0.3  Good      J     SI1      64      55   339  4.25  4.28  2.73
#  6  0.22 Premium   F     SI1      60.4    61   342  3.88  3.84  2.33
#  7  0.31 Ideal     J     SI2      62.2    54   344  4.35  4.37  2.71
#  8  0.2  Premium   E     SI2      60.2    62   345  3.79  3.75  2.27
#  9  0.3  Ideal     I     SI2      62      54   348  4.31  4.34  2.68
# 10  0.3  Good      J     SI1      63.4    54   351  4.23  4.29  2.7 
# # ... with 22,249 more rows

如何弄清楚或确认这一点:

diamonds %>%
  filter({browser(); across(starts_with("c"),~grepl("^S" ,.)); })
# Called from: mask$eval_all_filter(dots, env_filter)
# debug at #1: across(starts_with("c"), ~grepl("^S", .))

across(starts_with("c"), ~ grepl("^S" , .))
# # A tibble: 53,940 x 4
#    carat cut   color clarity
#    <lgl> <lgl> <lgl> <lgl>  
#  1 FALSE FALSE FALSE TRUE   
#  2 FALSE FALSE FALSE TRUE   
#  3 FALSE FALSE FALSE FALSE  
#  4 FALSE FALSE FALSE FALSE  
#  5 FALSE FALSE FALSE TRUE   
#  6 FALSE FALSE FALSE FALSE  
#  7 FALSE FALSE FALSE FALSE  
#  8 FALSE FALSE FALSE TRUE   
#  9 FALSE FALSE FALSE FALSE  
# 10 FALSE FALSE FALSE FALSE  
# # ... with 53,930 more rows

对我来说，显然人们希望任何行至少有一个 TRUE(或者可能全部，但我现在假设“任何”)。由于这是一个逻辑框架，我们可以使用 rowSums，它应该将 false 求和为 0，将 true 求和为 1，所以

head(rowSums(across(starts_with("c"), ~ grepl("^S" , .))) > 0)
# [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE

这是一个单一的逻辑向量，每行一个，这是 dplyr::filter 最终想要/需要的。

自 1.0.4 起的 dplyr

参见 https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

diamonds %>%
  filter(if_any(across(starts_with("c"),~grepl("^S" ,.))))

关于r - 结合过滤器、跨和 starts_with 以跨 R 中的列进行字符串搜索，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66052130/

r - 结合过滤器、跨和 starts_with 以跨 R 中的列进行字符串搜索

1.0.4之前的dplyr

自 1.0.4 起的 dplyr

上一篇：docker - 更改 nginx - Docker 容器中的 client_max_body_size nginx.conf 调用包括 HTTP、服务器和位置部分；导入

下一篇：php - 在 WooCommerce 结账时在 "Place Order"按钮下添加文本