sql-server - Sql Server 2005 全文搜索中的干扰词

标签 sql-server full-text-search

我正在尝试对数据库中的一系列名称进行全文搜索。这是我第一次尝试使用全文搜索。目前,我输入输入的搜索字符串,并在每个术语之间放置一个 NEAR 条件(即输入的短语“Kings of Leon”变为“Kings NEAR of NEAR Leon”)。

不幸的是,我发现这种策略会导致假阴性搜索结果,因为 SQL Server 在创建索引时会删除单词“of”,因为它是干扰词。因此,“Kings Leon”将正确匹配,但“Kings of Leon”则不会。

我的同事建议将 MSSQL\FTData\noiseENG.txt 中定义的所有干扰词放入 .Net 代码中,以便在执行全文搜索之前将干扰词去除。

这是最好的解决方案吗?是否没有一些我可以在 SQL Server 中更改的自动魔法设置来为我执行此操作?或者也许只是一个更好的解决方案,不会让人感觉那么hacky?

最佳答案

全文将根据您提供的搜索条件进行工作。您可以从文件中删除干扰词,但这样做确实有使索引大小膨胀的风险。罗伯特·凯恩 (Robert Cain) 在他的博客上提供了很多与此相关的好信息:

http://arcanecode.com/2008/05/29/creating-and-customizing-noise-words-in-sql-server-2005-full-text-search/

为了节省一些时间,您可以查看此方法如何删除它们并复制代码和单词:

        public string PrepSearchString(string sOriginalQuery)
    {
        string strNoiseWords = @" 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | $ | ! | @ | # | $ | % | ^ | & | * | ( | ) | - | _ | + | = | [ | ] | { | } | about | after | all | also | an | and | another | any | are | as | at | be | because | been | before | being | between | both | but | by | came | can | come | could | did | do | does | each | else | for | from | get | got | has | had | he | have | her | here | him | himself | his | how | if | in | into | is | it | its | just | like | make | many | me | might | more | most | much | must | my | never | now | of | on | only | or | other | our | out | over | re | said | same | see | should | since | so | some | still | such | take | than | that | the | their | them | then | there | these | they | this | those | through | to | too | under | up | use | very | want | was | way | we | well | were | what | when | where | which | while | who | will | with | would | you | your | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z ";

        string[] arrNoiseWord = strNoiseWords.Split("|".ToCharArray());

        foreach (string noiseword in arrNoiseWord)
        {
            sOriginalQuery = sOriginalQuery.Replace(noiseword, " ");
        }
        sOriginalQuery = sOriginalQuery.Replace("  ", " ");
        return sOriginalQuery.Trim();
    }

但是,我可能会使用 Regex.Replace 来实现这一点,这应该比循环快得多。我只是没有一个简单的例子可以发布。

关于sql-server - Sql Server 2005 全文搜索中的干扰词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/938175/

相关文章:

sql - 如何插入显式值和从另一个表检索的数据的混合

sql-server - Azure data studio不会重新连接到本地托管的数据库

创建新的 SqlDataAdapter 时出现 C# InvalidOperationException

python - 全文搜索和 Python

powershell从文本文件中导入字符串后的数据

solr - 如何在 SOLR 中索引文档?

sql-server - 带附件的数据库电子邮件(excel 文件/pdf 文件)?

asp.net - 在 SQL Server 中存储 session (asp.net)

c++ - 全文分词器

Mysql FULLTEXT索引,搜索锁表