python - python 中的正则表达式不起作用

标签 python regex

我正在做 Python for Informatics 一书中的练习，它要求我编写一个程序来模拟 UNIX 上 grep 命令的操作。但是，我的代码不起作用。这里我简化了我的代码，只打算计算有多少行以“查找”一词开头。我很困惑，希望你能对此有所了解。

from urllib.request import urlopen
import re

fhand = urlopen('http://www.py4inf.com/code/mbox-short.txt')
sumFind = 0

for line in fhand:
    line = str(line) #convert from byte to string for re operation
    if re.search('^From',line) is not None:
        sumFind+=1

print(f'There are {sumFind} lines that match.')

脚本的输出是

There are 0 lines that match.

这是输入文本的链接: text

非常感谢您的宝贵时间。

最佳答案

错误是使用 str 将字节转换为字符串。

>>> str(b'foo')
"b'foo'"

你会需要

line = line.decode()

但最好的方法是将字节正则表达式传递给支持的正则表达式:

for line in fhand:
    if re.search(b'^From',line) is not None:
        sumFind+=1

现在我得到了 54 个匹配项。

请注意，您可以将整个循环简化为:

sum_find = sum(bool(re.match(b'From',line)) for line in fhand)

re.match 将使用 ^ 的需要替换为搜索
无需循环，sum 计算 re.match 返回真值的次数(显式转换为 bool，因此它可以求和0 或 1)

或者在没有正则表达式的情况下更简单:

sum_find = sum(line.startswith(b"From") for line in fhand)

关于python - python 中的正则表达式不起作用，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48909748/

上一篇：python - Tensorflow 对象检测 API : output boxes for probability less than 50%

下一篇：python - 按行索引拆分 Spark 数据帧

相关文章：

python - 是否可以在 PyScript 中使用 OpenCV 模块？

python - 分批训练会导致更多的过拟合

javascript - 自动嵌入Youtube的jQuery函数需要接受多种链接格式

regex - 在 Perl 正则表达式中使用量词而不是仅仅重复字符是否有令人信服的理由？

javascript - 使用javascript从url获取第二个路径元素

python : Assign value of a function result/variable to a class

python - Google App Engine - 连接到 MySql 数据库 - 意外的关键字参数 'user'

python - 如何在 Pandas 组中选择日期范围？

javascript - REGEX 检查字符串开头或结尾的字符

java - 如何在 hashmap 迭代器中使用 patter.matcher 比较用户定义的字符串对象