我想知道我是否可以在 BigQuery 中使用正则表达式从字符串中提取所有数字。
我认为下面的方法有效,但只返回第一个命中 - 有没有办法提取所有命中。
我在这里的用例是,我基本上想从 url 中获取最大的数字,因为它更像是我需要加入的 post_id。
这是我正在谈论的例子:
SELECT
mystr,
REGEXP_EXTRACT(mystr, r'(\d+)') AS nums
FROM
(SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
(SELECT 'just one number 123 in this one ' AS mystr),
(SELECT '99' AS mystr),
(SELECT 'another -2 example 99' AS mystr),
(SELECT 'another-8766 example 99' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)
我从中得到的结果是:
[
{
"mystr": "this is a string with some 666 numbers 999 in it 333",
"nums": "666"
},
{
"mystr": "just one number 123 in this one ",
"nums": "123"
},
{
"mystr": "99",
"nums": "99"
},
{
"mystr": "another -2 example 99",
"nums": "2"
},
{
"mystr": "another-8766 example 99",
"nums": "8766"
},
{
"mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999",
"nums": "2015"
},
{
"mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001",
"nums": "2015"
},
{
"mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview",
"nums": "2015"
}
]
最佳答案
经过一番挖掘,我最终得到了这个解决方案:
SELECT
mystr,
GROUP_CONCAT(SPLIT(REGEXP_REPLACE(mystr, r'[^\d]+', ','))) AS nums
FROM
(SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
(SELECT 'just one number 123 in this one ' AS mystr),
(SELECT '99' AS mystr),
(SELECT 'another -2 example 99' AS mystr),
(SELECT 'another-8766 example 99' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)
工作原理:
- 我首先使用正则表达式匹配任何非数字并用逗号替换
- 然后使用
split
得到结果,空的结果被丢弃 group_concat
这里只是展示结果
关于regex - 使用正则表达式从 Google BigQuery 中的字符串中提取数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34290723/