r - 提取R中字符之间的数字

我需要将以下数据集中的“值”变量分成三个变量:估计、低、高。请注意，有时没有置信区间，所以我只有值。

country gho year    publishstate    value
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1980    Published   4.9 [2.5-8.6]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1981    Published   5.1 [2.7-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1982    Published   5.2 [2.9-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1983    Published   5.4 [3.1-8.6]

我已经尝试过这个:

Data$estimate <- sub("\\[.*","",Data$value)

但它仅适用于创建变量估计。我正在考虑使用 strsplit 但它也不起作用......

你能帮忙解决一下吗？

非常感谢，

最佳答案

使用注释中显示的可重复形式的数据，我们可以使用如图所示的单独。如果 value< 中仅列出一个子字段，则 fill="right" 参数会导致 lower 和 upper 用 NA 填充.

library(dplyr)
library(tidyr)
DF %>%
  separate(value, c("value", "lower", "upper", NA), sep = "[^0-9.]+", fill = "right")

注意

Lines <- "country,glucose,year,publishstate,value
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1980,Published,4.9 [2.5-8.6]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1981,Published,5.1 [2.7-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1982,Published,5.2 [2.9-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1983,Published,5.4 [3.1-8.6]"
DF <- read.csv(text = Lines, header = TRUE, as.is = TRUE)

关于r - 提取R中字符之间的数字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59931822/

r - 提取R中字符之间的数字

注意

上一篇：sql - 在 SQL Server 中为名字和姓氏的首字母创建触发器

下一篇：SQL 列出具有实际所有者的所有公寓