mysql - 挑战!复杂的 MySQL 查询

我们正在编写一个小型搜索引擎。数据库表:

Documents (DocumentID, Title, Abstract, Author, ...)
InvertedIndex (DocumentID, Word, Count)
Stopwords (Word)

其中InvertedIndex有一个词条对应每个Document中的每个词以及它出现的次数。停用词只是我不关心的单词列表。使用由 or 分隔的术语列表查询引擎。例如:

第一学期第二学期
第一学期或第二学期
term1 term2 或 term3

...等等基于相关性的搜索结果，使用 bool 扩展模型为每个文档计算。 and-ed 项(所有未被 or 的项)相乘并且 ors 相加。例如，考虑查询 term1 term2 或 term3，如果术语在文档中分别出现 3、4 和 5 次，则文档相关性将为 (3*4)+5 = 12。另外，忽略停用词中存在的术语.

好的，现在……我的教授告诉我们，计算所有文档的相关性可以在单个查询中完成。这就是我需要帮助的地方。

我已经为示例查询term1 term2 或term3 准备了一些伪代码。所以这就是我计算每个文档的相关性的方式，但我想改为执行单个 MySQL 查询。我将其包括在内只是为了说明相关性公式。

foreach document
    relevance = 0
    foreach term_set // where (term1 term2) would be a term_set and (term3) would be the other
        product = 1
        foreach term
            if term not in stopwords
                SELECT Count FROM InvertedIndex WHERE Word=term AND DocumentID=document
                product *= Count
        relevance += product

(EXP(SUM(LOG(COALESCE(Column,1))) 显然是执行 aggregate multiplication 的一种方式。

如有任何帮助，我们将不胜感激。对不起，如果这是一件苦差事。现在是 2 点，我可能没有解释清楚。

最佳答案

如果我理解你的问题，这可能会帮助你开始(但你必须检查语法，因为我的 MySQL 已经生锈了):

Select DocumentId, Word, Count
From Documents
Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
Where Word In (term1, term2, term3)

此查询将为您提供 DocumentId 列表、“搜索”字词以及包含搜索字词的每个文档的计数。您可以以此为起点在 DocumentId 上进行聚合，使用 Group By DocumentId，然后计算您的聚合乘法函数(请留给您)。

我对 MySQL 的了解还不够多，不知道如何排除停用词表中的单词(您可以在 SQL Server 中使用 EXCEPT)，但像这样的方法可能有效:

Select DocumentId, Word, Count
From Documents
Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
Where Word In (term1, term2, term3)
And Where Not Exists (
    Select DocumentId, Word, Count
    From Documents
    Inner Join InvertedIndex On Documents.DocumentID = InvertedIndex.DocumentID
    Inner Join Stopwords On InvertedIndex.Word = Stopwords.Word
    Where Word In (term1, term2, term3)
)

祝你任务顺利。让我们知道结果如何!

关于mysql - 挑战!复杂的 MySQL 查询，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/5879246/

mysql - 挑战!复杂的 MySQL 查询

上一篇：mysql - 在 MySql 上搜索相似词

下一篇：即使存在引用，MySQL 外键错误