我需要将以下数据集中的“值”变量分成三个变量:估计、低、高。请注意,有时没有置信区间,所以我只有值。
country gho year publishstate value
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1980 Published 4.9 [2.5-8.6]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1981 Published 5.1 [2.7-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1982 Published 5.2 [2.9-8.5]
Afghanistan Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate) 1983 Published 5.4 [3.1-8.6]
我已经尝试过这个:
Data$estimate <- sub("\\[.*","",Data$value)
但它仅适用于创建变量估计。我正在考虑使用 strsplit 但它也不起作用......
你能帮忙解决一下吗?
非常感谢,
N.
最佳答案
使用注释中显示的可重复形式的数据,我们可以使用如图所示的单独
。如果 value< 中仅列出一个子字段,则
.fill="right"
参数会导致 lower
和 upper
用 NA 填充
library(dplyr)
library(tidyr)
DF %>%
separate(value, c("value", "lower", "upper", NA), sep = "[^0-9.]+", fill = "right")
注意
Lines <- "country,glucose,year,publishstate,value
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1980,Published,4.9 [2.5-8.6]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1981,Published,5.1 [2.7-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1982,Published,5.2 [2.9-8.5]
Afghanistan,Raised fasting blood glucose (>=7.0 mmol/L or on medication)(age-standardized estimate),1983,Published,5.4 [3.1-8.6]"
DF <- read.csv(text = Lines, header = TRUE, as.is = TRUE)
关于r - 提取R中字符之间的数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59931822/