我需要将字符串分成两组;第一个单词和第二个单词或单词组。这些单词由下划线分隔,当我使用当前代码时,如果有多个下划线,它只会分隔最后一个。这是我目前拥有的代码:
for record in reader:
s = record['trial']
patternsubgen = re.compile(r'(\w+)\(\w+\)\_(\w+)')
source = "Footit"
if patternsubgen.search(s):
resultsubgen = patternsubgen.search(s)
genussubgen = resultsubgen.group(1)
speciessubgen = resultsubgen.group(2)
subgen = '%s %s' % (genussubgen, speciessubgen)
#print(subgen)
else:
pattern = re.compile(r'(\w+)\_(\w+)')
if pattern.search(s):
result = pattern.search(s)
genus = result.group(1)
species = result.group(2)
new = '%s %s' % (genus, species)
print(new)
以下是一些字符串示例:
Aphis(Aphis)_asclepiadis, Cinara_011, Clydesmithia_canadensis_1a,
我需要的是:
Aphis asclepiadis,
Cinara 011,
Clydesmithia canadensis_1a,
我得到的是:
Aphis asclepiadis,
Cinara 011,
Clydesmithia_canadensis 1a
最佳答案
对于给定的字符串,您可以使用
\b([^_\W]+)(?:\([^()]+\))?_(\w+)\b
在 Python 中:
import re
strings = 'Aphis(Aphis)_asclepiadis, Cinara_011, Clydesmithia_canadensis_1a,'
rx = re.compile(r'\b([^_\W]+)(?:\([^()]+\))?_(\w+)\b')
strings = rx.sub("\g<1> \g<2>", strings)
print(strings)
# Aphis asclepiadis, Cinara 011, Clydesmithia canadensis_1a,
关于python - 在 python re 中分隔字符串中的第一个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47318128/