python - 我想捕获两个正则表达式匹配之间出现的文本

例如，这是我的字符串(它是来自 html 的文本)

html_text = """
TABLE OF CONTENTS

PART I  
| ITEM 1. BUSINESS  
| ITEM 1A. RISK FACTORS  
| ITEM 1B. UNRESOLVED CONFLICTS  
| ITEM 2. PROPERTIES  
| ITEM 3. LEGAL PROCEEDINGS  

    We believe that relations with our employees are good; however, the competition
    for such personnel is intense, and the loss of key personnel could have a
    material adverse impact on our results of operations and financial condition.

    ITEM  1A. |  RISK FACTORS  

    Set forth below and elsewhere in this report and in other documents we file
    with the SEC are descriptions of the risks and uncertainties that could cause
    our actual results to differ materially from the results contemplated by the
    forward-looking statements contained in this report.

    ITEM 1B. UNRESOLVED CONFLICTS

    Our future revenue, gross margins, operating results and net income are
    difficult to predict and may materially"""

我编写了一个正则表达式来捕获“ITEM 1A. RISK FACTORS”(不是来自目录)

re.search(r"(ITEM.*1A)*.+(RISK FACTORS).*\n+(?!\w)(?!.*ITEM.*1B)", html_text)

和另一个正则表达式来捕获“ITEM 1B. UNRESOLVED CONFLICTS”(不是来自目录)

re.search(still trying to figure this out)

我想捕获这两个匹配之间出现的所有文本。最终的文本字符串应如下所示:

final_text = """    ITEM  1A. |  RISK FACTORS  

    Set forth below and elsewhere in this report and in other documents we file
    with the SEC are descriptions of the risks and uncertainties that could cause
    our actual results to differ materially from the results contemplated by the
    forward-looking statements contained in this report."""

最佳答案

这可能对你有用:

re.compile(r"^(    ITEM  1A. \|  RISK FACTORS.+\n(?:\n.+)+)", re.MULTILINE)

可以在此处查看 Regex101但请注意，由于没有使用 re.compile(REGEXP, REGEXPOPTION) 设置，它的工作方式有所不同。

关于python - 我想捕获两个正则表达式匹配之间出现的文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56672873/

python - 我想捕获两个正则表达式匹配之间出现的文本

上一篇：Matplotlib:更改指数的字体大小

下一篇：Python Paramiko 目录遍历 SFTP