regex - Neo4j 正则表达式字符串匹配未返回预期结果

标签 regex neo4j cypher

我尝试在 Cypher 中使用 Neo4j 2.1.5 正则表达式匹配,但遇到了问题。

我需要对用户有权访问的特定字段实现全文搜索。访问要求是关键,它阻止我将所有内容转储到 Lucene 实例中并以这种方式进行查询。访问系统是动态的,因此我需要查询特定用户有权访问的节点集,然后在这些节点内执行搜索。我真的很想将节点集与 Lucene 查询进行匹配,但我不知道如何做到这一点,所以我现在只使用基本的正则表达式匹配。我的问题是 Neo4j 并不总是返回预期的结果。

例如,我有大约 200 个节点,其中之一如下:

( i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})

此查询产生一个结果:

MATCH (p)-->(:group)-->(i:node)
  WHERE (i.name =~ "(?i).*mosaic.*")
  RETURN i

> Returned 1 row in 569 ms

但是即使描述属性与表达式匹配,此查询也会产生零结果:

MATCH (p)-->(:group)-->(i:node)
  WHERE (i.description=~ "(?i).*mosaic.*")
  RETURN i

> Returned 0 rows in 601 ms

即使该查询包含之前返回结果的 name 属性,它也会产生零结果:

MATCH (p)-->(:group)-->(i:node)
  WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
  WHERE (searchText =~ "(?i).*mosaic.*")
  RETURN i

> Returned 0 rows in 487 ms

MATCH (p)-->(:group)-->(i:node)
  WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
  RETURN searchText

>
...
SotoLinear Glass Mosaic Tiles Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
...

mosaic

更奇怪的是,如果我搜索不同的术语,它会毫无问题地返回所有预期结果。

MATCH (p)-->(:group)-->(i:node)
  WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
  WHERE (searchText =~ "(?i).*plumbing.*")
  RETURN i

> Returned 8 rows in 522 ms

然后,我尝试在节点上缓存搜索文本,并添加一个索引以查看这是否会改变任何内容,但它仍然没有产生任何结果。

CREATE INDEX ON :node(searchText)

MATCH (p)-->(:group)-->(i:node)
  WHERE (i.searchText =~ "(?i).*mosaic.*")
  RETURN i

> Returned 0 rows in 3182 ms

然后我尝试简化数据以重现问题,但在这个简单的情况下,它按预期工作:

MERGE (i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})

WITH i, (
  i.name + " " + COALESCE(i.description, "")
) AS searchText

WHERE searchText =~ "(?i).*mosaic.*"
RETURN i

> Returned 1 rows in 630 ms

我也尝试使用 CYPHER 2.1.EXPERIMENTAL 标签,但这并没有改变任何结果。我是否对正则表达式支持的工作原理做出了错误的假设?我还应该尝试其他方法或其他方法来调试问题吗?

其他信息

这是我在创建节点时对 Cypher Transactional Rest API 进行的示例调用。这是向数据库添加节点时发送的实际纯文本(除了一些便于阅读的格式之外)。任何字符串编码都只是 Go 在创建新的 HTTP 请求时执行的标准 URL 编码。

{"statements":[
    {
    "parameters":
        {
        "p01":"lsF30nP7TsyFh",
        "p02":
            {
            "description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
            "id":"lsF3BxzFdn0kj",
            "name":"Linear Glass Mosaic Tiles",
            "object":"material"
            }
        },
    "resultDataContents":["row"],
    "statement":
        "MATCH (p:project { id: { p01 } })
        WITH p

        CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material  { p02 })"
    }
]}

如果是编码问题,为什么搜索name工作,description不工作,并且name + description不行?有什么方法可以检查数据库以查看数据是否/如何编码。当我执行搜索时,返回的文本显示正确。

最佳答案

只是一些注意事项:

  • 可能会用 merge 替换 create unique(其工作方式略有不同)
  • 对于全文搜索,我会选择 lucene legacy index为了性能,如果您的组限制不足以将响应保持在几毫秒以下

我刚刚尝试了您的确切 json 语句,它工作完美

插入

curl -H accept:application/json -H content-type:application/json -d @insert.json \
     -XPOST http://localhost:7474/db/data/transaction/commit

json:

{"statements":[
    {
    "parameters":
        {
        "p01":"lsF30nP7TsyFh",
        "p02":
            {
            "description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
            "id":"lsF3BxzFdn0kj",
            "name":"Linear Glass Mosaic Tiles",
            "object":"material"
            }
        },
    "resultDataContents":["row"],
    "statement":
        "MERGE (p:project { id: { p01 } })
        WITH p

        CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material  { p02 }) RETURN m"
    }
]}

查询:

MATCH (p)-->(:group)-->(i:material)
 WHERE (i.description=~ "(?i).*mosaic.*")
 RETURN i

返回:

name:   Linear Glass Mosaic Tiles
id: lsF3BxzFdn0kj
description:    Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
object: material

您可以尝试检查数据的是查看浏览器提供的 json 或 csv 转储(结果和表格结果上的小下载图标)

或者你使用 neo4j-shell 和我的 shell-import-tools实际输出 csv 或 graphml 并检查这些文件。

或者使用一些java(或groovy)代码来检查您的数据。

neo4j-enterprise 下载中还附带了一致性检查器。这是blog post关于如何运行它。

java -cp 'lib/*:system/lib/*' org.neo4j.consistency.ConsistencyCheckTool /tmp/foo

我在这里添加了一个常规测试脚本:https://gist.github.com/jexp/5a183c3501869ee63d30

另一个想法:正则表达式标志

有时会发生多行情况,还有两个标志:

  • 多行 (?m) 也可以跨多行匹配,
  • dotall (?s) 允许点也匹配特殊字符,例如换行符

那么你可以尝试(?ism).*mosaic.*

关于regex - Neo4j 正则表达式字符串匹配未返回预期结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26571379/

相关文章:

javascript - 正则表达式查找方括号包围的单个单词

java - 在 Java 中查找简单序列重复的有效方法

javascript - 如何通过使用属性(n)的密码请求获取 Node ID?

neo4j - 如何使用属性信息生成关系 [Neo4j]

regex - 正则表达式的 Kotlin 性能问题

python - 使用数字作为标签名称来解析损坏的 XML

csv - Neo4j 中的关系属性

neo4j - 考虑到关系方向性,计算密码中的总路径成本

java - Neo4J 与 APOC 和 MongoDB 驱动程序,限制从 Mongo 返回的记录

Neo4j 双向关系