我需要编写一个 Python 脚本来删除文本文件中包含非字母字符的每个单词,以测试 Zipf 定律。 例如:
asdf@gmail.com said: I've taken 2 reports to the boss
到
taken reports to the boss
我应该如何进行?
最佳答案
使用正则表达式只匹配字母(和下划线),你可以这样做:
import re
s = "asdf@gmail.com said: I've taken 2 reports to the boss"
# s = open('text.txt').read()
tokens = s.strip().split()
clean_tokens = [t for t in tokens if re.match(r'[^\W\d]*$', t)]
# ['taken', 'reports', 'to', 'the', 'boss']
clean_s = ' '.join(clean_tokens)
# 'taken reports to the boss'
关于python - 如何删除每个非字母字符的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46486157/