r - 如何从字符串中查找特定单词并按这些单词合并变量

标签 r

假设我有一个数据“A”,例如:

Disease_name
(J189)Pneumonia, unspecified
(R51)Headache
(M4806)Spinal stenosis, lumbar region
(M512)Other specified intervertebral disc displacement
(C187)Sigmoid colon
(N201)Calculus of ureter
(C189)Colon, unspecified
(S0600)Concussion, without open intracranial wound
(C73)Malignant neoplasm of thyroid gland
(C509)Breast, unspecified
(K746)Other and unspecified cirrhosis of liver
(B181)Chronic viral hepatitis B without delta- agent
(R42)Dizziness and giddiness

另一个数据集 B 如下:

parts         key
Chest       pneumonia
Head        headache
Abdominal   spinal
Abdominal   intervetebral
Abdominal   colon
Abdominal   ureter
Abdominal   colon
Head        concussion
Neck        thyroid
Chest       breast
Abdominal   liver
Abdominal   hepatitis
Head        giddiness

我想查找B$key的单词来自A&disease_name并通过那些匹配的关键字将 A 合并到 B 以分配 B$partsA&disease_name

如何在 R 中执行此操作?

最佳答案

欢迎来到SO!这个问题对我来说已经很清楚了。这是一个 tidyverse 解决方案。

首先读取一些数据:

library(dplyr)

tmp <- data.table::fread(
"Disease_name
(J189)Pneumonia, unspecified
(R51)Headache
(M4806)Spinal stenosis, lumbar region
(M512)Other specified intervertebral disc displacement
(C187)Sigmoid colon
(N201)Calculus of ureter
(C189)Colon, unspecified
(S0600)Concussion, without open intracranial wound
(C73)Malignant neoplasm of thyroid gland
(C509)Breast, unspecified
(K746)Other and unspecified cirrhosis of liver
(B181)Chronic viral hepatitis B without delta- agent
(R42)Dizziness and giddiness",
sep = ""
)


tmp2 <- data.table::fread(
  "parts  key
Chest   pneumonia
Head    headache
Abdominal   spinal
Abdominal   intervertebral
Abdominal   colon
Abdominal   ureter
Abdominal   colon
Head    concussion
Neck    thyroid
Chest   breast
Abdominal   liver
Abdominal   hepatitis
Head    giddiness"
)

然后我们进行连接:

result <-
  tmp %>%
  mutate(key = gsub(paste0(".*(", paste(tmp2$key, collapse = "|"), ").*"),
  "\\1",
  tolower(tmp$Disease_name))) %>%
  left_join(tmp2)
#> Joining, by = "key"

结果:

result
#>                                              Disease_name            key
#> 1                            (J189)Pneumonia, unspecified      pneumonia
#> 2                                           (R51)Headache       headache
#> 3                   (M4806)Spinal stenosis, lumbar region         spinal
#> 4  (M512)Other specified intervertebral disc displacement intervertebral
#> 5                                     (C187)Sigmoid colon          colon
#> 6                                     (C187)Sigmoid colon          colon
#> 7                                (N201)Calculus of ureter         ureter
#> 8                                (C189)Colon, unspecified          colon
#> 9                                (C189)Colon, unspecified          colon
#> 10     (S0600)Concussion, without open intracranial wound     concussion
#> 11               (C73)Malignant neoplasm of thyroid gland        thyroid
#> 12                              (C509)Breast, unspecified         breast
#> 13         (K746)Other and unspecified cirrhosis of liver          liver
#> 14   (B181)Chronic viral hepatitis B without delta- agent      hepatitis
#> 15                           (R42)Dizziness and giddiness      giddiness
#>        parts
#> 1      Chest
#> 2       Head
#> 3  Abdominal
#> 4  Abdominal
#> 5  Abdominal
#> 6  Abdominal
#> 7  Abdominal
#> 8  Abdominal
#> 9  Abdominal
#> 10      Head
#> 11      Neck
#> 12     Chest
#> 13 Abdominal
#> 14 Abdominal
#> 15      Head

reprex package于2018年9月28日创建(v0.2.1)

关于r - 如何从字符串中查找特定单词并按这些单词合并变量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52549954/

相关文章:

r - 如何在给定模式 R 的情况下清理我的数字条目?

返回 R 中过去最近或等效的日期

css - 在 R Shiny 中更改 selectInput 的背景颜色

使用 RSript 运行 R 脚本似乎不会保存工作空间

r - 用R解析xml数据,代码很慢。如何让它更快?

r - 如何使用R在Google Scholar中下载搜索结果?

r - 性能错误(pred, "tpr", "fpr")

r - 使用 ggplot2 制作带有分类轴的散点图

r - 将参数作为 (1) 字符串向量和 (2) 变量名提供给 data.table

r - 如何防止覆盖文件?