我有一个像这样的 .csv 文件(除了真正的 .csv 文件有更多列):
library(tidyverse)
tibble(id1 = c("a", "b"),
id2 = c("c", "d"),
data1 = c(1, 2),
data2 = c(3, 4),
data1s = c(5, 6),
data2s = c(7, 8)) %>%
write_csv("df.csv")
我只想要 id1、id2、data1 和 data2。
我可以做到这一点:
df <- read_csv("df.csv",
col_names = TRUE,
cols_only(id1 = col_character(),
id2 = col_character(),
data1 = col_integer(),
data2 = col_integer()))
但是,如上所述,我的真实数据集有更多列,所以我想使用 tidyselect
帮助程序仅读取指定的列并确保指定的格式。
我尝试过这个:
df2 <- read_csv("df.csv",
col_names = TRUE,
cols_only(starts_with("id") = col_character(),
starts_with("data") & !ends_with("s") = col_integer()))
但是错误消息表明语法有问题。是否可以使用tidyselect
helper 就这样?
最佳答案
我的建议在某种程度上是围绕房子的,但它几乎可以让你在“规则”而不是明确的基础上自定义读取规范
library(tidyverse)
tibble(id1 = c("a", "b"),
id2 = c("c", "d"),
data1 = c(1, 2),
data2 = c(3, 4),
data1s = c(5, 6),
data2s = c(7, 8)) %>%
write_csv("df.csv")
# read only 1 row to make a spec from with minimal read; really just to get the colnames
df_spec <- spec(read_csv("df.csv",
col_names = TRUE,
n_max = 1))
#alter the spec with base R functions startsWith / endsWith etc.
df_spec$cols <- imap(df_spec$cols,~{if(startsWith(.y,"id")){
col_character()
} else if(startsWith(.y,"data") &
!endsWith(.y,"s")){
col_integer()
} else {
col_skip()
}})
df <- read_csv("df.csv",
col_types = df_spec$cols)
关于r - 是否可以将 tidyselect 助手与 cols_only() 函数一起使用?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73642144/