php - 如何处理全文搜索中的多个搜索条件和优先级

标签 php mysql full-text-search

是否可以以任何方式减少执行的查询? 因为我现在这样做的方式还可以,但后来我可能会得到 30 个查询,这对我来说看起来不太好

我的脚本

$string = 'new movie stars';
$words =  preg_split('/(\/|\s+)/', $string);
print_r($words);

Array ( [0] => new [1] => movie [2] => stars )

$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('+$words[0] +$words[1] +$words[2]' IN BOOLEAN MODE)";
$query_name = $this->db->query($sql);

if ($query_name->num_rows < 20) {
$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('+$words[0] +($words[1] $words[2])' IN BOOLEAN MODE)";
$query_name_two = $this->db->query($sql);
}

if (count($query_name->num_rows + $query_name_two->num_rows) < 20) {
$sql = "SELECT * FROM movie WHERE MATCH(name) AGAINST('$words[0] $words[1] $words[2]' IN BOOLEAN MODE)";
$query_name_three = $this->db->query($sql);
}

最佳答案

您的代码已开放给 SQL injection相关的攻击。偶real_escape_string无法完全确保其安全。请学会使用Prepared Statements相反。

现在,除了上述建议之外,还有两个可能的进一步修复:

修复 #1 您用来将输入字符串标记为 FTS 单词的 php 代码是不够的。不久前,我确实创建了一个函数来以更稳健的方式处理此需求。您可以使用以下内容:

/**
 * Method to take an input string and tokenize it into an array of words for Full Text Searching (FTS).
 * This method is used when an input string can be made up of multiple words (let's say, separated by space characters),
 * and we need to use different Boolean operators on each of the words. The tokenizing process is similar to extraction
 * of words by FTS parser in MySQL. The operators used for matching in Boolean condition are removed from the input $phrase.
 * These characters as of latest version of MySQL (8+) are: +-><()~*:""&|
 * We can also execute the following query to get updated list: show variables like 'ft_boolean_syntax';
 * Afterwards, the modified string is split into individual words considering either space, comma, and, period (.) characters.
 * Details at: https://dev.mysql.com/doc/refman/8.0/en/fulltext-natural-language.html
 * @param string $phrase Input statement/phrase consisting of words
 * @return array Tokenized words
 * @author Madhur, 2019
 */
function tokenizeStringIntoFTSWords(string $phrase) : array {
    $phrase_mod = trim(preg_replace('/[><()~*:"&|+-]/', '', trim($phrase)));
    return preg_split('/[\s,.]/', $phrase_mod, null, PREG_SPLIT_NO_EMPTY);
}

修复 #2 您似乎正在尝试通过按以下顺序给予优先级来对搜索进行排名:

文本中的所有单词 > 第一个单词 AND 其余两个单词中的任何一个 > 至少三个单词中的任何一个。

但是,如果您阅读 Full Text Search Documentation ,您可以使用 MATCH() 按相关性进行排序,因为它还会返回相关性分数。

When MATCH() is used in a WHERE clause, the rows returned are automatically sorted with the highest relevance first (Unfortunately, this works only in NATURAL mode, not BOOLEAN mode). Relevance values are nonnegative floating-point numbers. Zero relevance means no similarity. Relevance is computed based on the number of words in the row (document), the number of unique words in the row, the total number of words in the collection, and the number of rows that contain a particular word.

所以基本上,文本中的所有单词已经比至少三个单词中的任何一个具有更高的相关性。现在,如果您需要给予第一个单词更高的优先级,您只需在第一个单词上使用 > 运算符即可。因此,您所需要的只是以下单个查询:

SELECT * FROM movie 
WHERE 
  MATCH(name) 
  AGAINST('>:first_word :second_word :third_word ..and so on)' IN BOOLEAN MODE)
ORDER BY 
  MATCH(name) 
  AGAINST('>:first_word :second_word :third_word ..and so on)' IN BOOLEAN MODE) 
  DESC
LIMIT 20

关于php - 如何处理全文搜索中的多个搜索条件和优先级,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57968929/

相关文章:

php - php中的gettype和未知类型

php - 如何使用swig为c++生成php接口(interface)so

mysql - Sphinx和用户排名

php - 我的 php 函数不会被调用

php - 使用 PHP 通过 SSL 从 Linux 为 iOS 应用程序推送通知

php - 使用php从mysql捕获数据通过ajax发送到js并填充表

c# - 运行时优化 C# -> MySQL

mysql - mysql_ssl_rsa_setup 从哪里获取 OpenSSL 文件?

search - 在具有规范化数据的应用程序中进行多语言自由文本搜索?

使用属性进行 Django-haystack 结果过滤?