php - 类似谷歌的搜索算法

我正在尝试在我的简单数据结构中实现搜索算法。但是，这不是“我该怎么做？”的问题，而是“我如何优化算法？”

我试图保留一个文件索引，每个文件都可以与任意数量的标签相关联(就像一个类别)

这是我的数据结构:

条目:

 ------------------------------------
|  id  | description | short | score | 
 ------------------------------------

标签:

 -------------
|  id  | text |
 -------------

入口标签:

 -------------------
| entry_id | tag_id |
 -------------------

在搜索字段中，搜索请求将始终变成用加号 (+) 分开的单个词。

在下面的示例中，我将搜索“blue+website+simple+layout”

- split searchterm up into array named t
- convert each word in array t into a number using the id from "Tags" table
- for each element in array t, select make new array for each element with "EntryTags" matching the search
- generate array A, where elements that are in all 4 arrays are put into
- generate array B, where elements that are in 3 of the 4 arrays are put into
- generate array C, where elements that are in 2 of the 4 arrays are put into
- generate array D with the last elemenets rest
- sort array A,B,C and D by the score parameter from the table
- output array A, then B, then C, then D

当然这没有优化或任何东西，但我缺乏使用更复杂的 SQL 的经验让我很不爽 :(

最后，所有这些都将用 PHP 和 mysqli 库编写(当然，我会随着我的进一步发展保持线程更新)

最佳答案

你可以使用一种 Bloom filter (至少这是谷歌战略的一部分)。首先，您查找具有所有输入标签的条目。如果您什么也没找到，请尝试所有缺少一个标签的组合，然后尝试缺少两个标签……直到您有足够的匹配项。 Bloom filter中的lookup非常快，所以lookup的次数多也没问题。

关于php - 类似谷歌的搜索算法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6232396/

php - 类似谷歌的搜索算法

上一篇：php - 关于php unset函数的问题

下一篇：php - 从 PHPUnit 代码覆盖率中排除 PHP 接口(interface)