我尝试在 Cypher 中使用 Neo4j 2.1.5 正则表达式匹配,但遇到了问题。
我需要对用户有权访问的特定字段实现全文搜索。访问要求是关键,它阻止我将所有内容转储到 Lucene 实例中并以这种方式进行查询。访问系统是动态的,因此我需要查询特定用户有权访问的节点集,然后在这些节点内执行搜索。我真的很想将节点集与 Lucene 查询进行匹配,但我不知道如何做到这一点,所以我现在只使用基本的正则表达式匹配。我的问题是 Neo4j 并不总是返回预期的结果。
例如,我有大约 200 个节点,其中之一如下:
( i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
此查询产生一个结果:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.name =~ "(?i).*mosaic.*")
RETURN i
> Returned 1 row in 569 ms
但是即使描述属性与表达式匹配,此查询也会产生零结果:
MATCH (p)-->(:group)-->(i:node)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 601 ms
即使该查询包含之前返回结果的 name 属性,它也会产生零结果:
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 487 ms
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
RETURN searchText
>
...
SotoLinear Glass Mosaic Tiles Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
...
更奇怪的是,如果我搜索不同的术语,它会毫无问题地返回所有预期结果。
MATCH (p)-->(:group)-->(i:node)
WITH i, (p.name + i.name + COALESCE(i.description, "")) AS searchText
WHERE (searchText =~ "(?i).*plumbing.*")
RETURN i
> Returned 8 rows in 522 ms
然后,我尝试在节点上缓存搜索文本,并添加一个索引以查看这是否会改变任何内容,但它仍然没有产生任何结果。
CREATE INDEX ON :node(searchText)
MATCH (p)-->(:group)-->(i:node)
WHERE (i.searchText =~ "(?i).*mosaic.*")
RETURN i
> Returned 0 rows in 3182 ms
然后我尝试简化数据以重现问题,但在这个简单的情况下,它按预期工作:
MERGE (i:node {name: "Linear Glass Mosaic Tiles", description: "Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!"})
WITH i, (
i.name + " " + COALESCE(i.description, "")
) AS searchText
WHERE searchText =~ "(?i).*mosaic.*"
RETURN i
> Returned 1 rows in 630 ms
我也尝试使用 CYPHER 2.1.EXPERIMENTAL 标签,但这并没有改变任何结果。我是否对正则表达式支持的工作原理做出了错误的假设?我还应该尝试其他方法或其他方法来调试问题吗?
其他信息
这是我在创建节点时对 Cypher Transactional Rest API 进行的示例调用。这是向数据库添加节点时发送的实际纯文本(除了一些便于阅读的格式之外)。任何字符串编码都只是 Go 在创建新的 HTTP 请求时执行的标准 URL 编码。
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MATCH (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 })"
}
]}
如果是编码问题,为什么搜索name
工作,description
不工作,并且name
+ description
不行?有什么方法可以检查数据库以查看数据是否/如何编码。当我执行搜索时,返回的文本显示正确。
最佳答案
只是一些注意事项:
- 可能会用 merge 替换 create unique(其工作方式略有不同)
- 对于全文搜索,我会选择 lucene legacy index为了性能,如果您的组限制不足以将响应保持在几毫秒以下
我刚刚尝试了您的确切 json 语句,它工作完美。
插入
curl -H accept:application/json -H content-type:application/json -d @insert.json \
-XPOST http://localhost:7474/db/data/transaction/commit
json:
{"statements":[
{
"parameters":
{
"p01":"lsF30nP7TsyFh",
"p02":
{
"description":"Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!",
"id":"lsF3BxzFdn0kj",
"name":"Linear Glass Mosaic Tiles",
"object":"material"
}
},
"resultDataContents":["row"],
"statement":
"MERGE (p:project { id: { p01 } })
WITH p
CREATE UNIQUE (p)-[:MATERIAL]->(:materials:group {name: \"Materials\"})-[:MATERIAL]->(m:material { p02 }) RETURN m"
}
]}
查询:
MATCH (p)-->(:group)-->(i:material)
WHERE (i.description=~ "(?i).*mosaic.*")
RETURN i
返回:
name: Linear Glass Mosaic Tiles
id: lsF3BxzFdn0kj
description: Introducing our new Rip Curl linear glass mosaic tiles. This Caribbean color combination of greens and blues brings a warm inviting feeling to a kitchen backsplash or bathroom. The colors work very well with white cabinetry or larger tiles. We also carry this product in a small subway mosaic to give you some options! SOLD OUT: Back in stock end of August. Call us to pre-order and save 10%!
object: material
您可以尝试检查数据的是查看浏览器提供的 json 或 csv 转储(结果和表格结果上的小下载图标)
或者你使用 neo4j-shell 和我的 shell-import-tools实际输出 csv 或 graphml 并检查这些文件。
或者使用一些java(或groovy)代码来检查您的数据。
neo4j-enterprise 下载中还附带了一致性检查器。这是blog post关于如何运行它。
java -cp 'lib/*:system/lib/*' org.neo4j.consistency.ConsistencyCheckTool /tmp/foo
我在这里添加了一个常规测试脚本:https://gist.github.com/jexp/5a183c3501869ee63d30
另一个想法:正则表达式标志
有时会发生多行情况,还有两个标志:
多行 (?m)
也可以跨多行匹配,dotall (?s)
允许点也匹配特殊字符,例如换行符
那么你可以尝试(?ism).*mosaic.*
关于regex - Neo4j 正则表达式字符串匹配未返回预期结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26571379/