python - 两个字符串之间的正则表达式匹配python

标签 python regex

我正在尝试匹配两个 [Term][Term][Typedef] 中出现的所有情况文件包含类似这样的内容:

remark: Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/go/never_in_taxon.owl>))) [Axioms: 18 Logical Axioms: 0]
ontology: go

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

[Typedef]
id: positively_regulates
name: positively regulates
namespace: external
xref: RO:0002213
holds_over_chain: negatively_regulates negatively_regulates
is_a: regulates ! regulates
transitive_over: part_of ! part of

[Typedef]
id: regulates
name: regulates
namespace: external
xref: RO:0002211
is_transitive: true
transitive_over: part_of ! part of

与:(?=\[Term\]\s)[\s\S]*(?=\s\s\[Term\]\s) 我只匹配在第一个 [Term] 和倒数第二个之间。

最佳答案

您可以使用

r'(?m)^\[Term].*(?:\r?\n(?!\[(?:Typedef|Term)]).*)*'

请参阅regex demo

详细信息

  • (?m) - 多行修饰符
  • ^ - 行的开头
  • \[Term] - [Term] 子字符串
  • .* - 当前行的其余部分
  • (?:\r?\n(?!\[(?:Typedef|Term)]).*)* - 出现 0 次或多次:
    • \r?\n(?!\[(?:Typedef|Term)]) - 换行符(CRLF 或 LF)后面没有 [Typedef][Term] 子字符串
    • .* - 当前行的其余部分

Python code :

import re
s = """remark: Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/go/never_in_taxon.owl>))) [Axioms: 18 Logical Axioms: 0]
ontology: go

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

[Typedef]
id: positively_regulates
name: positively regulates
namespace: external
xref: RO:0002213
holds_over_chain: negatively_regulates negatively_regulates
is_a: regulates ! regulates
transitive_over: part_of ! part of

[Typedef]
id: regulates
name: regulates
namespace: external
xref: RO:0002211
is_transitive: true
transitive_over: part_of ! part of"""
rx = r'(?m)^\[Term].*(?:\r?\n(?!\[(?:Typedef|Term)]).*)*'
cnt=0
for m in re.findall(rx, s):
    print(m)
    print('-------------- Next match ---------------')
    cnt = cnt + 1

print("Number of mathes: {}".format(cnt))

输出:

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

-------------- Next match ---------------
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

-------------- Next match ---------------
[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

-------------- Next match ---------------
Number of mathes: 3

关于python - 两个字符串之间的正则表达式匹配python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45825525/

相关文章:

regex - 仅适用于五个连续数字的正则表达式模式

python - 在 python 3 的 visual studio 代码中检测未使用的导入?

python - 如何使用 MediaWiki API 获取子子类别中的文章数量

python - 在 SQLAlchemy-ORM 中查询组合键

python - 编译时出现 GCC 错误 : cc1 out of memory error

正则表达式可选所有内容以空格或逗号分隔(城市、州)

regex - 如何使用正则表达式验证带有可选百分比符号的小数?

python - Django 脆皮表单选项卡

PHP:使用正则表达式在语句中查找语句

java - 正则表达式在java中分割字符串如 "ABC123XYZ111"