python - 从两个文件中提取相同的行，同时忽略小写/大写

目的是从两个文件中提取相同的行，同时忽略大小写和标点符号

我有两个文件

源文件.txt

Foo bar
blah blah black sheep
Hello World
Kick the, bucket

processed.txt

foo bar
blah sheep black
Hello world
kick the bucket ,

期望的输出(来自 source.txt):

Foo bar

Hello World
Kick the, bucket

我一直这样做:

from string import punctuation
with open('source.txt', 'r') as f1, open('processed.txt', 'r') as f2:
  for i,j in zip(f1, f2):
    lower_depunct_f1 = " ".join("".join([ch.lower() for ch in f1 if f1 not in punctuation]).split())
    lower_depunct_f2 = " ".join("".join([ch.lower() for ch in f2 if f2 not in punctuation]).split())
    if lower_depunct_f1 == lower_depunct_f2:
      print f1
    else:
      print

有没有办法用 bash 工具做到这一点？ perl、shell、awk、sed？

最佳答案

使用 awk 更容易做到这一点:

awk 'FNR==NR {s=toupper($0); gsub(/[[:blank:][:punct:]]+/, "", s); a[s]++;next}
   {s=toupper($0); gsub(/[[:blank:][:punct:]]+/, "", s); print (s in a)?$0:""}' file2 file1
Foo bar

Hello World
Kick the, bucket

关于python - 从两个文件中提取相同的行，同时忽略小写/大写，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25619110/

上一篇：python - 使用索引条件对列表元素求和

下一篇：python阅读整个段落而不是阅读行

相关文章：

bash - 将命令的内容读入变量并获取第一行

perl - 所有不可打印的字符都是控制字符吗？

perl - 如何在 Perl 中使用变量的值作为 glob 模式？

linux - Shell 脚本 - 如何根据输入模拟某些结果？

php - 如何用BASH命令解密PHP Openssl加密

performance - 在 Perl 中高效处理所有可能的二维数组组合

python - 在Connection.execute sqlalchemy中运行UPDATE SET参数化sql语句

python - Python 中的概率分布函数

python - 比较 numpy 数组中的相邻值

python - 如何解决这个问题: ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine