python - 如何仅删除标点符号，如 "."和 ","？

我想搜索特定的词，例如“收入”或“收入”。为此，我创建了一个词表并在文本中搜索词。

但是，对于带有附加标点符号(如“earnings”)的单词，我的代码不会返回任何结果。或“收入”。现在，我想删除这些标点符号，而不删除数字中的一个点，如“2.4”或任何其他标记，如“%”。

我已经试过了

table = str.maketrans({key: None for key in string.punctuation})
text_wo_dots = text.translate(table)

和

text_wo_dots = re.sub(r'[^\w\s]',' ',text)

但这删除了所有标点符号。

最佳答案

我建议，您首先将文本拆分为单独的单词，包括标点符号

text = ["This is an example, it contains 1.0 number and some words."]
raw_list = text.split()

现在您可以删除元素末尾的标点符号。

cleaned_words = []
for word in raw_list:
    if word[-1] in ['.', ',', '!', '?']:
        cleaned_words.append(word[:-1])
    else:
        cleaned_words.append(word)

注意 1: 如果您的文本包含类似 1. 的数字，对于 1.0，您还需要考虑倒数第二个字符并留下if isdigit() 中的点计算为 True
注意 2:如果有以多个标点符号结尾的句子，您应该运行一个 while 循环来删除它们，然后仅在找不到更多标点符号时才追加。

while True:
    if word[-1] in ['.', ',', '!', '?']:
        word = word[:-1]
    else:
        break

cleaned_words.append(word)

关于python - 如何仅删除标点符号，如 "."和 ","？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55631414/

上一篇：python - 如何通过odoo计算字段进行搜索？

下一篇：python - 如何使 cx_Freeze 将子模块编译成共享对象 (.so) 文件？

python print 将字符串文字作为代码的一部分执行

c# - 从类中返回日期

c++ - string::replace 是否会使迭代器和引用失效？

python - 正则表达式 - 修改Python列表

python - 如何防止在单元测试python中截断字符串

python - 我收到错误， "atan2() takes exactly 2 arguments (1 given)"

Python:从模式下向mplayer发送命令

python - NLTK 正则表达式标记器 : Regex to retain just characters in Random text

ruby-on-rails - Rails 验证正则表达式不工作 : invalid though input correct