假设我有一个数据“A”,例如:
Disease_name
(J189)Pneumonia, unspecified
(R51)Headache
(M4806)Spinal stenosis, lumbar region
(M512)Other specified intervertebral disc displacement
(C187)Sigmoid colon
(N201)Calculus of ureter
(C189)Colon, unspecified
(S0600)Concussion, without open intracranial wound
(C73)Malignant neoplasm of thyroid gland
(C509)Breast, unspecified
(K746)Other and unspecified cirrhosis of liver
(B181)Chronic viral hepatitis B without delta- agent
(R42)Dizziness and giddiness
另一个数据集 B 如下:
parts key
Chest pneumonia
Head headache
Abdominal spinal
Abdominal intervetebral
Abdominal colon
Abdominal ureter
Abdominal colon
Head concussion
Neck thyroid
Chest breast
Abdominal liver
Abdominal hepatitis
Head giddiness
我想查找B$key
的单词来自A&disease_name
并通过那些匹配的关键字将 A 合并到 B 以分配 B$parts
至A&disease_name
。
如何在 R 中执行此操作?
最佳答案
欢迎来到SO!这个问题对我来说已经很清楚了。这是一个 tidyverse
解决方案。
首先读取一些数据:
library(dplyr)
tmp <- data.table::fread(
"Disease_name
(J189)Pneumonia, unspecified
(R51)Headache
(M4806)Spinal stenosis, lumbar region
(M512)Other specified intervertebral disc displacement
(C187)Sigmoid colon
(N201)Calculus of ureter
(C189)Colon, unspecified
(S0600)Concussion, without open intracranial wound
(C73)Malignant neoplasm of thyroid gland
(C509)Breast, unspecified
(K746)Other and unspecified cirrhosis of liver
(B181)Chronic viral hepatitis B without delta- agent
(R42)Dizziness and giddiness",
sep = ""
)
tmp2 <- data.table::fread(
"parts key
Chest pneumonia
Head headache
Abdominal spinal
Abdominal intervertebral
Abdominal colon
Abdominal ureter
Abdominal colon
Head concussion
Neck thyroid
Chest breast
Abdominal liver
Abdominal hepatitis
Head giddiness"
)
然后我们进行连接:
result <-
tmp %>%
mutate(key = gsub(paste0(".*(", paste(tmp2$key, collapse = "|"), ").*"),
"\\1",
tolower(tmp$Disease_name))) %>%
left_join(tmp2)
#> Joining, by = "key"
结果:
result
#> Disease_name key
#> 1 (J189)Pneumonia, unspecified pneumonia
#> 2 (R51)Headache headache
#> 3 (M4806)Spinal stenosis, lumbar region spinal
#> 4 (M512)Other specified intervertebral disc displacement intervertebral
#> 5 (C187)Sigmoid colon colon
#> 6 (C187)Sigmoid colon colon
#> 7 (N201)Calculus of ureter ureter
#> 8 (C189)Colon, unspecified colon
#> 9 (C189)Colon, unspecified colon
#> 10 (S0600)Concussion, without open intracranial wound concussion
#> 11 (C73)Malignant neoplasm of thyroid gland thyroid
#> 12 (C509)Breast, unspecified breast
#> 13 (K746)Other and unspecified cirrhosis of liver liver
#> 14 (B181)Chronic viral hepatitis B without delta- agent hepatitis
#> 15 (R42)Dizziness and giddiness giddiness
#> parts
#> 1 Chest
#> 2 Head
#> 3 Abdominal
#> 4 Abdominal
#> 5 Abdominal
#> 6 Abdominal
#> 7 Abdominal
#> 8 Abdominal
#> 9 Abdominal
#> 10 Head
#> 11 Neck
#> 12 Chest
#> 13 Abdominal
#> 14 Abdominal
#> 15 Head
由reprex package于2018年9月28日创建(v0.2.1)
关于r - 如何从字符串中查找特定单词并按这些单词合并变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52549954/