有什么方法可以将字符串中的所有数字提取为向量吗?我有一个不遵循任何特定模式的大型数据集,因此使用 extract
+ regex
模式不一定会提取所有数字。因此,例如对于如下所示的每一行数据框:
c("3.2% 1ST $100000 AND 1.1% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY",
"$4000", "3.3% 1ST $100000 AND 1.2% BALANCE", "3.3% 1ST $100000 AND 1.2% BALANCE",
"3.2 - $100000")
[1] "3.2% 1ST $100000 AND 1.1% BALANCE"
[2] "3.3% 1ST $100000 AND 1.2% BALANCE AND $3000 BONUS FULL PRICE ONLY"
[3] "$4000"
[4] "3.3% 1ST $100000 AND 1.2% BALANCE"
[5] "3.3% 1ST $100000 AND 1.2% BALANCE"
[6] "3.2 - $100000"
我想要这样的输出:
[1] "3.2 100000 1.1"
[2] "3.3 100000 1.2 3000"
[3] "4000"
[4] "3.3 100000 1.2 "
[5] "3.3 100000 1.2 "
[6] "3.2 100000 "
我查看了资源并找到了这个链接:https://statisticsglobe.com/extract-numbers-from-character-string-vector-in-r
regmatches(x, gregexpr("[[:digit:]]+", x))
上面的函数似乎可以工作,但它不能同时处理所有类型的数字。我知道 "[[:digit:]]+"
只查找整数,但我们如何更改它以使其涵盖所有类型的数字?
最佳答案
我们需要在匹配模式中添加.
sapply(regmatches(x, gregexpr("\\b[[:digit:].]+\\b", x)), paste, collapse= ' ')
#[1] "3.2 100000 1.1"
#[2] "3.3 100000 1.2 3000"
#[3] "4000"
#[4] "3.3 100000 1.2"
#[5] "3.3 100000 1.2"
#[6] "3.2 100000"
关于r - 如何将字符串中的所有数字提取为向量,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64812421/