mysql - 为给定字符串搜索数据库的好算法

标签 mysql database search text

我正在开发一个网络应用程序 (PHP + MySQL),用户可以在其中通过输入一些搜索字符串来搜索其他用户。

我需要将用户的输入字符串与我的数据库中“用户”表的 2 列(用户名和全名)相匹配,并返回最相关(20 或 50)的匹配项。最理想的是,我还需要考虑拼写错误。

我该如何处理?我不想在这里重新发明轮子。

最佳答案

您可以使用 MySQL full Text search 来完成:

请看this , this , this文章。

我想给你解释一下 Boolean Full Text Search ;但我建议你通过Full Text Search using Query Expansion还有。

让我们看一下 dev.mysql.com 上给出的示例表:

mysql> select * from articles;
+----+-----------------------+------------------------------------------+
| id | title                 | body                                     |
+----+-----------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial   | DBMS stands for DataBase ...             |
|  2 | How To Use MySQL Well | After you went through a ...             |
|  3 | Optimizing MySQL      | In this tutorial we will show ...        |
|  4 | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ...      |
|  5 | MySQL vs. YourSQL     | In the following database comparison ... |
|  6 | MySQL Security        | When configured properly, MySQL ...      |
+----+-----------------------+------------------------------------------+

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('"database comparison"' IN BOOLEAN MODE);

+----+-------------------+------------------------------------------+
| id | title             | body                                     |
+----+-------------------+------------------------------------------+
|  5 | MySQL vs. YourSQL | In the following database comparison ... |
+----+-------------------+------------------------------------------+

当引用单词时,顺序很重要:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('"comparison database"' IN BOOLEAN MODE);

Empty set (0.01 sec)

当我们删除引号时,它将搜索包含单词“数据库”或“比较”的行:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('database comparison' IN BOOLEAN MODE);

+----+---------------------+------------------------------------------+
| id | title               | body                                     |
+----+---------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL   | In the following database comparison ... |
+----+---------------------+------------------------------------------+

现在顺序无关紧要:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('comparison database' IN BOOLEAN MODE);

+----+---------------------+------------------------------------------+
| id | title               | body                                     |
+----+---------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL   | In the following database comparison ... |
+----+---------------------+------------------------------------------+

如果我们想要获取包含单词“PostgreSQL”或短语“数据库比较”的行,我们应该使用这个请求:

mysql> SELECT * FROM articles WHERE MATCH (title,body)
     AGAINST ('PostgreSQL "database comparison"' IN BOOLEAN MODE);

+----+---------------------+------------------------------------------+
| id | title               | body                                     |
+----+---------------------+------------------------------------------+
|  1 | PostgreSQL Tutorial | DBMS stands for DataBase ...             |
|  5 | MySQL vs. YourSQL   | In the following database comparison ... |
+----+---------------------+------------------------------------------+

Fiddle To Try

确保您要搜索的词不在 list of stopwords 中, 被忽略。
(很明显像'is','the'这样的词是stopwords,那些被忽略了)

要在 bool 模式下增强结果的排序,您可以使用以下查询:

(假设您在用户输入的字符串中总共有 2 个单词)那么。

SELECT column_names, MATCH (text) AGAINST ('word1 word2')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2' in boolean mode) 
order by col1 desc;

(如果用户输入字符串中有 3 个单词)那么..

SELECT column_names, MATCH (text) AGAINST ('word1 word2 word3')
AS col1 FROM table1
WHERE MATCH (text) AGAINST ('+word1 +word2 +word3' in boolean mode) 
order by col1 desc;

使用first MATCH() 我们在非 bool 搜索模式下得到分数(更有特色)第二个 MATCH() 确保我们真正只返回我们想要的结果(包含所有 3 个单词)

关于mysql - 为给定字符串搜索数据库的好算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29467667/

相关文章:

来自两个不同表的mysql时间总和

mysql - 选择今天的日期来获取记录

python - 从 Python 中没有今天日期的列表中获取所有值

android - 将数据库从云端下载到 Android

search - Vim:搜索并突出显示但不跳转

mysql - 如何在 ruby​​ on rails 中向表列添加唯一性

mysql - 复杂的SQL : How to filtrate duplicate data about technical support and compute rank of order

mysql - 如何向现有表添加外键?

database - cassandra 数据库中的通配符搜索

c# - 按另一个字符串的位置对字符串进行排序