python - 正则表达式:选择所有相邻的两个(主题标签)单词组

我有一个示例字符串:

#water #atlantic ocean #sea

我想使用正则表达式来选择所有彼此相邻的两个主题标签词组。这将返回:

[[['#water']['#atlantic ocean']], [['#atlantic ocean']['#sea']]]

我不知道如何执行这个正则表达式。我得到的最接近的是: ([#][A-Za-z\s]+\s?)

它只会产生以下内容(在Python中):

>>> regex.findall(string)
[u'#water ', u'#atlantic ocean ', u'#sea']

我尝试在末尾添加一个 {2}，但这似乎不匹配。关于如何实现这一目标有什么想法吗？

最佳答案

对我来说，使用 #(或空格后跟散列)进行分割比使用复杂的正则表达式更直观:

import re
expr = "#water #atlantic ocean #sea"
groups = filter(None, re.split(r' ?#', expr))
# another option is to use a split that doesn't require regex at all:
# groups = filter(None, map(str.strip, expr.split("#"))) 
res = []
for i, itm in enumerate(groups):
    if i < len(groups)-1:
        res.append(["#"+itm, "#"+groups[i + 1]])

print res  # [['#water', '#atlantic ocean'], ['#atlantic ocean', '#sea']]

关于python - 正则表达式:选择所有相邻的两个(主题标签)单词组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26501238/

上一篇：python - 从 .h5 文件写入 excel : performance

下一篇：python - 如何在Turtle中制作笑脸？

相关文章：

Java如何替换反斜杠？

regex - 无法从网址获取其他项目

javascript - 模式的 Smarty 异常

javascript - Eloquent Javascript 在 RegExp 匹配上循环

python - 如何将 DRF 自定义序列化器字段与数据库模型一起使用

python - 我们可以结合来自开放式cv和scikit-image的代码吗？

python - 类型错误 : unhashable type: 'dict' in Networkx random walk code that was previously working

python - 读取关卡脚本在某些站点上发送错误 "IndexError: string index out of range"

python - 在 basemap 中填充海洋

python - 正则表达式匹配所有带有引号的句子