我正在开发一个 Python 程序,该程序会筛选 .txt 文件以查找属名和种名。这些行的格式如下(是的,等号始终围绕通用名称):
1. =Common Name= Genus Species some other words that I don't want.
2. =Common Name= Genus Species some other words that I don't want.
我似乎想不出一个可以只匹配属和种而不匹配通用名称的正则表达式。我知道等号 (=) 可能会以某种方式提供帮助,但我想不出如何使用它们。
编辑:一些真实数据:
1. =Western grebe.= ÆCHMOPHORUS OCCIDENTALIS. Rare migrant; western species, chiefly interior regions of North America.
2. =Holboell's grebe.= COLYMBUS HOLBOELLII. Rare migrant; breeds far north; range, all of North America.
3. =Horned grebe.= COLYMBUS AURITUS. Rare migrant; range, almost the same as the last.
4. =American eared grebe.= COLYMBUS NIGRICOLLIS CALIFORNICUS. Summer resident; rare in eastern, common in western Colorado; breeds from plains to 8,000 feet; partial to alkali lakes; western species.
最佳答案
对于这个,您可能不需要正则表达式。如果您需要的单词顺序和单词数始终相同,您可以将每一行拆分为子字符串列表并获取该列表的第三个(属)和第四个(种)元素。代码可能看起来像这样:
myfile = open('myfilename.txt', 'r')
for line in myfile.readlines():
words = line.split()
genus, species = words[2], words[3]
对我来说,它看起来更“Pythonic”一点。
如果通用名称可以包含多个单词,则建议的代码将返回不正确的结果。为了在这种情况下也获得正确的结果,您可以使用以下代码:
myfile = open('myfilename.txt', 'r')
for line in myfile.readlines():
words = line.split('=')[2].split() # If the program returns wrong results, try changing the index from 2 to 1 or 3. What number is the right one depends on whether there can be any symbols before the first "=".
genus, species = words[0], words[1]
关于python - 正则表达式 Python [python-2.7],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32705353/