我正在一个有2个数据框的项目-产品和订单
像这样的产品:
product <- data.frame(SKU = c("CVL-","CVP-", "CVS-", "MugsW-11-", "MugsW-15-"),
TM_Product = c("Canvas", "Canvas", "Canvas",
"Mugs", "Mugs"))
订单:包括SKU_order之类的
order <- data.frame(Order_ID = c(1,2,3,4,5,6),
Lineitem_Sku = c("F-M-White", "MugsW-11-2005",
"TS-BS-F-XL-Black", " MugsW-15",
"TS-BS-F-XL-White", "TS-BS-F-3XL"))
我的任务是从SKU_Order查找产品。
我期望的数据帧是:
Order_ID,Lineitem_Sku,产品(将Lineitem_Sku与产品数据中的SKU匹配,并获取对应的TM_Product)
我写我的函数:
product_get <- function(x) {
if (is.na(x)) {
z = NA_character_
} else if (sum(str_detect(x, pattern = paste0("^", product$SKU))) == 0) {
z = NA_character_
} else {
z = product[str_detect(x, pattern = paste0("^", product$SKU)),2] %>%
pull()
}
return(z)
}
但是当我在变异中使用它时
order %>%
mutate(product = product_get(Lineitem_Sku))
1: In if (is.na(x)) { :
the condition has length > 1 and only the first element will be used
2: In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
longer object length is not a multiple of shorter object length
3: In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
longer object length is not a multiple of shorter object length
任何人都可以帮助我。谢谢大家
最佳答案
好的,这将是一个很长的答案,在此先对不起,有很多事情可以改善。
首先,正如我在评论中所写,您的函数不是矢量化的,因为它是一个简单的if()
子句。不向量化意味着您不能对整个 vector 或一列数字进行操作,它只会采用第一个并发出警告。这就是您要尝试的操作,您尝试使用mutate()
遍历现有列以创建一个新列。 if()
的矢量化版本是ifelse()
,但是如果您坚持使用if()
(有时应该这样做),则可以使用map_chr()
中的purrr
在mutate()
内部对函数进行矢量化,如下所示:
library(tidyverse)
product <- data.frame(SKU = c("CVL-","CVP-", "CVS-", "MugsW-11-", "MugsW-15-"),
TM_Product = c("Canvas", "Canvas", "Canvas", "Mugs", "Mugs"))
order <- data.frame(Order_ID = c(1,2,3,4,5,6),
Lineitem_Sku = c("F-M-White", "MugsW-11-2005", "TS-BS-F-XL-Black", "MugsW-15", "TS-BS-F-XL-White", "TS-BS-F-3XL"))
product_get <- function(x){
if(is.na(x)){
z = NA_character_
} else if (sum(str_detect(x, pattern = paste0("^", product$SKU))) == 0){
z = NA_character_
} else {
z = product[str_detect(x, pattern = paste0("^", product$SKU)),2] %>%
pull()
}
return(z)
}
order %>%
mutate(product = map_chr(Lineitem_Sku, product_get))
我们得到一个错误!
Error in UseMethod("pull") :
no applicable method for 'pull' applied to an object of class "factor"
那是因为当您使用
data.frame()
(我认为R <4.0.0)时,除非您指定character
,否则表的factor
vector 会自动变为stringsAsFactors = FALSE
,如下所示:product <- data.frame(SKU = c("CVL-","CVP-", "CVS-", "MugsW-11-", "MugsW-15-"),
TM_Product = c("Canvas", "Canvas", "Canvas", "Mugs", "Mugs"), stringsAsFactors = FALSE)
如果您已经在
tidyverse
中,那么另一种选择是简单地使用tibble
:product <- tibble(SKU = c("CVL-","CVP-", "CVS-", "MugsW-11-", "MugsW-15-"),
TM_Product = c("Canvas", "Canvas", "Canvas", "Mugs", "Mugs"))
order <- tibble(Order_ID = c(1,2,3,4,5,6),
Lineitem_Sku = c("F-M-White", "MugsW-11-2005", "TS-BS-F-XL-Black", "MugsW-15", "TS-BS-F-XL-White", "TS-BS-F-3XL"))
order %>%
mutate(product = map_chr(Lineitem_Sku, product_get))
我们得到:
# A tibble: 6 x 3
Order_ID Lineitem_Sku product
<dbl> <chr> <chr>
1 1 F-M-White NA
2 2 MugsW-11-2005 Mugs
3 3 TS-BS-F-XL-Black NA
4 4 MugsW-15 NA
5 5 TS-BS-F-XL-White NA
6 6 TS-BS-F-3XL NA
希望这是您想要的。但是我不确定我们完成了。
首先,请注意,每次输入函数时,您都要使用
paste0()
定义模式 vector !这很浪费。即使这样,您也可以使用str_detect()
对其进行两次检查。product_patterns <- paste0("^", product$SKU)
product_get <- function(x) {
if (is.na(x)) {
return(NA_character_)
}
check <- str_detect(x, pattern = product_patterns)
if (sum(check) == 0) {
z <- NA_character_
} else {
z <- product %>%
filter(check) %>%
pull(TM_Product)
}
return(z)
}
order %>%
mutate(product = map_chr(Lineitem_Sku, product_get))
其次,如果仅更改输入,此功能可能仍然会失败。假设我在
product
中仅添加了一个“Mugs”行:product <- tibble(SKU = c("CVL-","CVP-", "CVS-", "MugsW-11-", "MugsW-15-", "MugsW-11"),
TM_Product = c("Canvas", "Canvas", "Canvas", "Mugs", "Mugs", "Mugs"))
order %>%
mutate(product = map_chr(Lineitem_Sku, product_get))
另一个错误!
Error: Result 2 must be a single string, not a character vector of length 2
Run `rlang::last_error()` to see where the error occurred.
这是因为您明确地只依靠查找一个产品,而在这里该功能找到了两个产品。
map_chr()
假定它返回的每个元素都是单个字符串,并引发错误。因此,您可能需要查看返回列表的map()
,或者改进您的功能以免失败(例如返回第一个产品)。如果您的数据很大,我也会考虑使用
inner_join()
完成所有这些操作。
关于r - 在mutate中使用用户定义的函数时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62125490/