algorithm - 从 A 中找到文章 B 中的连续单词

标签 algorithm

There are two articles, A and B, which are very large. Get three or more successive words in A and check if they appear in B, and count how many times they appear. For example, if 'book' 'his' and 'her' appear in A, how many times do they appear in B?

我想过拆分B的全部内容，然后用StringToken检查A中的所有3个词，但我不确定算法效率。

最佳答案

看看什么是Hashtable是，一个一个地扫描文件 B 中的单词(如果你不关心大文件的内存使用情况，你可以拆分)你在哈希表中找到的每个单词(当没有找到时)或者增加一个单词的次数被看见了。

然后你只需扫描。在 A 上，寻找每组 3 个单词，使用滚动滑动窗口。这样您以后就可以增加窗口的长度而无需重写任何内容。

作为引用，您真的应该这样标记家庭作业问题。

关于algorithm - 从 A 中找到文章 B 中的连续单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10117377/