python - 在 python 中查找字符串中存在的相似文本

我有一个包含文本的 txt 文件

Table of Contents

Preface 1

Chapter 1: Tokenizing Text and WordNet Basics 7

Tokenizing text into sentences 8

Tokenizing sentences into words 10

Tokenizing sentences using regular expressions 12

如果我的字符串是:

input = "Tokenzing sentence using expressions"

我想过用开头词和结尾词来提取句子，但有很多重复。

那么获得输出的最佳方式是什么

Tokenizing sentences using regular expressions

最佳答案

如果您准备预处理章节标题，消除页码和其他内容，则:

import difflib
contents = ["Tokenizing Text and WordNet Basics",
            "Tokenizing text into sentences",
            "Tokenizing sentences into words",
            "Tokenizing sentences using regular expressions"]
input = "Tokenzing sentence using expressions"
print (difflib.get_close_matches(input, contents, n=1))

会给你这个输出:

['Tokenizing sentences using regular expressions']

关于python - 在 python 中查找字符串中存在的相似文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/44227820/

上一篇：python - Google Python API 尝试导入已弃用的 oauth2client.contrib.multistore_file

下一篇：Python 请求 : downloaded image is corrupted

相关文章：

python - Django Nose 这个测试怎么写？

python - 我们可以限制与 testtools.ConcurrentStreamTestSuite 并行运行的测试数量吗

python - 如何在 django 模板中传递 matplotlib 图形？

java - 返回包含多个变量的字符串的方法

java - java中的字符串池

python - 在遍历字典时使用递归函数来填充一个空的字典列表

python - 在 Mac 上安装图形工具后，我收到 ValueError : Namespace Gtk not available

string - 比较字符串的相似性？

c - 在 C 中提取格式化数据的有效方法

java - 如何在字符串中使用引号 "or ' 字符？