python - 如何使用Python从txt文件中删除特殊字符

标签 python

from glob import glob
pattern = "D:\\report\\shakeall\\*.txt"
filelist = glob(pattern)
def countwords(fp):
    with open(fp) as fh:
        return len(fh.read().split())
print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern
import os
uniquewords = set([])
for root, dirs, files in os.walk("D:\\report\\shakeall"):
    for name in files:
        [uniquewords.add(x) for x in open(os.path.join(root,name)).read().split()]
print "There are" ,len(uniquewords), "unique words in the files." "From directory", pattern

到目前为止，我的代码是这样的。这会计算 D:\report\shakeall\*.txt

中的唯一单词数和总单词数

问题是，例如，这段代码识别code code. 和code! 不同的词。因此，这不能作为唯一单词的确切数量的答案。

我想使用 Windows 文本编辑器从 42 个文本文件中删除特殊字符

或者制定一个异常(exception)规则来解决这个问题。

如果使用后者，我该如何编写代码？

让它直接修改文本文件？或者做一个不计算特殊字符的异常(exception)？

最佳答案

import re
string = open('a.txt').read()
new_str = re.sub('[^a-zA-Z0-9\n\.]', ' ', string)
open('b.txt', 'w').write(new_str)

它将每个非字母数字字符更改为空格。

关于python - 如何使用Python从txt文件中删除特殊字符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11902022/

上一篇：python - 此 WSGI 应用程序无法访问守护进程 :/mod. wsgi

下一篇：python - 将文件拖入 QtGui.QLineEdit() 以设置 url 文本

相关文章：

python - 如何找出K-最近邻算法中属性的权重？

python - 如果存在空白值，如何从两个文件中读入、插入新列以及计算平均值等函数？

python - python 中有孔的三角测量

python - 根据一列中的公共(public)值从两个或多个 2d numpy 数组创建交集

python - 缩短python中的命名空间

python - Whatsapp 自动机器人无法在 WhatsApp 联系人列表中搜索

python - 在linux中，wx.PopupWindow从wx.Dialog弹出时没有得到任何鼠标事件？

Python:在数据框中将 timedelta 转换为 int

python - 湿度时间序列预测

python - Python3 上的 Twistd 可执行文件