python3提取txt文件中两个字符串之间的字符串

标签 python html regex python-3.x parsing

我是 Python 新手。我正在尝试从一个 txt 文件(“infile.txt”)中提取一个字符串(“得出的结论是我们的披露控制有效”)。该文件比较大,我需要在一个特定的部分(“ITEM  9A”和“ITEM  9B”之间)查找上述字符串。此类部分的示例如下:

</A>ITEM&nbsp;9A. CONTROLS AND PROCEDURES. </B></FONT></P> <P STYLE="margin-top:6px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B>Evaluation of Disclosure Controls and Procedures </B></FONT> STYLE="margin-top:6px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">Under the supervision and with the participation of our management, including our Chief Executive Officer and Chief Financial Officer, we conducted an evaluation of the effectiveness of our disclosure controls and procedures (as defined in Rules 13a-15(e) and 15d-15(e) under the Securities Exchange Act of 1934, as amended (Exchange Act)), as of the end of the period covered by this Annual Report on Form 10-K. Management recognizes that any controls and procedures, no matter how well designed and operated, can provide only reasonable assurance of achieving their objectives and management necessarily applies its judgment in evaluating the cost-benefit relationship of possible controls and procedures. Based on such evaluation, our Chief Executive Officer and Chief Financial Officer concluded that our disclosure controls and procedures were effective as of September&nbsp;28, 2012. </FONT></P> <P STYLE="margin-top:18px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B>Management&#146;s Annual Report on Internal Control over Financial Reporting </B></FONT> <P STYLE="margin-top:6px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">This Annual Report does not include a report of management&#146;s assessment regarding internal control over financial reporting or an attestation report of the company&#146;s registered public accounting firm due to a transition period established by rules of the Securities and Exchange Commission for newly public companies. </FONT> <P STYLE="margin-top:18px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B>Changes in Internal Control over Financial Reporting </B></FONT></P> <P STYLE="margin-top:6px;margin-bottom:0px; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">There were no changes in our internal control over financial reporting (as defined in Rule&nbsp;13a-15(f) under the Exchange Act) during the quarter ended September&nbsp;28, 2012, that have materially affected, or are reasonably likely to materially affect, our internal control over financial reporting. </FONT> <P STYLE="margin-top:18px;margin-bottom:0px"><FONT STYLE="font-family:Times New Roman" SIZE="2"><B><A NAME="tx431171_16"></A>ITEM&nbsp;9B. OTHER INFORMATION.

如果该部分具有所需的字符串“断定我们的披露控制自以下日期起有效”(以上部分大约在中间),那么我想在单独的“输出”中打印“1”。 csv”文件,如果没有,打印“not found”。部分的起点并不总是与线的起点重合。抱歉,我不知道如何开始....我正在使用 Python 3.6。

非常感谢您!

最佳答案

你可以使用re.findall:

import re

the_data = re.findall("</A>ITEM&nbsp;9A. (.*?)</B>", string_data_from_file)

if len(the_data) >0:
    print "1"

else:
    print "Not found"

关于python3提取txt文件中两个字符串之间的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45258916/

相关文章:

jquery - 外部 Jquery 文件正在运行,但 Jquery 函数不起作用

javascript - 如何在用鼠标单击时仅在 html 中找到完全相同且具有相同名称的两个类的类?

python - 将 Airflow 用于频繁的任务

python - Pycharm 中的 Jupyter Notebook 身份验证 token

javascript - 如何在 html 和 javascript 中使用 dataTable 和 getelementbyid ?

regex - 使用 GREP 过滤

java - 如何检测 Java 中 if else 语句中的所有特殊字符?

Ruby 正则表达式匹配从特定位置开始

python - Pandas DataFrame 在复杂的 'if' 条件下使用前一行值来确定当前值

python - Pygame碰撞问题