neo4j - Neo.TransientError.General.OutOfMemoryError

标签 neo4j cypher

我正在尝试使用 Load CSV 从 CSV 加载大约 100 万行。我在 Windows 计算机上使用 Neo4j Enterprise 3.2.2。我已将头堆增加到 7g,但仍处于

Neo.TransientError.General.OutOfMemoryError

对于如何使用当前的密码查询加载此 CSV 有什么建议吗?

查询:

     using periodic commit 200 load csv with headers from "file:///LabsTab.Txt" as csvLine fieldterminator '\t' with csvLine where csvLine.ObservationName <> "Cancellation Reason"
    optional match (visit:Visit {VisitID: csvLine.VisitID}) 
    merge (provider:Provider {ProviderName: csvLine.ProviderName}) on create set provider.ProviderID = csvLine.OrderingProviderID
    merge (vlo:VisitLabOrder) on create set vlo.ProviderID = csvLine.OrderingProviderID on create set vlo.FillerOrderNo = csvLine.FillerOrderNo on create set vlo.OrderStartDtTm = apoc.date.parse(csvLine.OrderStartDtTm, "s", "yyyy/mm/dd hh:mm") 
on create set vlo.OrderStart = csvLine.OrderStartDtTm
    merge(lab:Lab{FillerOrderNo: csvLine.FillerOrderNo, OrderingProviderID: csvLine.OrderingProviderID, OrderingProvider: csvLine.ProviderName}) 
on create set lab.SpecimenCollectionDtTm = apoc.date.parse(csvLine.SpecimenCollectionDtTm, "s", "yyyy/mm/dd hh:mm") 
on create set lab.SpecimentReceivedDtTm = apoc.date.parse(csvLine.SpecimenReceivedDtTm, "s", "yyyy/mm/dd hh:mm")
 on create set lab.AnalysisDtTm= apoc.date.parse(csvLine.AnalysisDtTm, "s", "yyyy/mm/dd hh:mm") 
    merge(vlr:VisitLabResult{FillerOrderNo: csvLine.FillerOrderNo, ProviderID: csvLine.ProviderID}) on create set  vlr.ResultStatusChangeDtTm = apoc.date.parse(csvLine.ResultStatusChangeDtTm, "s", "yyyy/mm/dd hh:mm") 
on create set vlr.ResultStatusChange = csvLine.ResultStatusChangeDtTm
    merge (labobs:LabObservation{UniversalServiceName: csvLine.UniversalServiceName, UniversalServiceID: csvLine.UniversalServiceID, ObservationName: csvLine.ObservationName, ObservationValue: csvLine.ObservationValue, Units: csvLine.Units}) 
    //Merge (visit)-[r:Lab_tested]->(vlo)-[:Lab_tested]->(lab)-[:Observation_result]->(labobs)
    //merge (lab)-[:Lab_resulted]->(vlr)-[:Lab_resulted]->(visit)
    //merge (vlr)<-[:Ordered]-(provider)-[:Ordered]->(vlo)

最佳答案

您应该将LOAD CSVUSING PERIODIC COMMIT结合使用。

来自the docs :

If the CSV file contains a significant number of rows (approaching hundreds of thousands or millions), USING PERIODIC COMMIT can be used to instruct Neo4j to perform a commit after a number of rows. This reduces the memory overhead of the transaction state. By default, the commit will happen every 1000 rows.

您可以在 USING PERIODIC COMMIT 之后更改指定所需数量的默认行为,例如:

USING PERIODIC COMMIT 500
LOAD CSV FROM 'https://neo4j.com/docs/developer-manual/3.2/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInt(line[2])})

此外,ON CREATE SET 可以通过MERGE 指定一次。每个分配可以用 , 分隔。我不知道这些更改是否会产生影响,但请尝试:)

using periodic commit 200 load csv with headers from "file:///LabsTab.Txt" as csvLine fieldterminator '\t'
with csvLine where csvLine.ObservationName <> "Cancellation Reason"
optional match (visit:Visit {VisitID: csvLine.VisitID}) 

merge (provider:Provider {ProviderName: csvLine.ProviderName})
    on create set provider.ProviderID = csvLine.OrderingProviderID

merge (vlo:VisitLabOrder)
    on create set vlo.ProviderID = csvLine.OrderingProviderID,
    vlo.FillerOrderNo = csvLine.FillerOrderNo,
    vlo.OrderStartDtTm = apoc.date.parse(csvLine.OrderStartDtTm, "s", "yyyy/mm/dd hh:mm"),
    vlo.OrderStart = csvLine.OrderStartDtTm

merge(lab:Lab{FillerOrderNo: csvLine.FillerOrderNo, OrderingProviderID: csvLine.OrderingProviderID, OrderingProvider: csvLine.ProviderName}) 
    on create set lab.SpecimenCollectionDtTm = apoc.date.parse(csvLine.SpecimenCollectionDtTm, "s", "yyyy/mm/dd hh:mm"),
    lab.SpecimentReceivedDtTm = apoc.date.parse(csvLine.SpecimenReceivedDtTm, "s", "yyyy/mm/dd hh:mm"),
    lab.AnalysisDtTm= apoc.date.parse(csvLine.AnalysisDtTm, "s", "yyyy/mm/dd hh:mm") 

merge(vlr:VisitLabResult{FillerOrderNo: csvLine.FillerOrderNo, ProviderID: csvLine.ProviderID})
    on create set vlr.ResultStatusChangeDtTm = apoc.date.parse(csvLine.ResultStatusChangeDtTm, "s", "yyyy/mm/dd hh:mm"),
    vlr.ResultStatusChange = csvLine.ResultStatusChangeDtTm

merge (labobs:LabObservation{UniversalServiceName: csvLine.UniversalServiceName, UniversalServiceID: csvLine.UniversalServiceID, ObservationName: csvLine.ObservationName, ObservationValue: csvLine.ObservationValue, Units: csvLine.Units}) 
//Merge (visit)-[r:Lab_tested]->(vlo)-[:Lab_tested]->(lab)-[:Observation_result]->(labobs)
//merge (lab)-[:Lab_resulted]->(vlr)-[:Lab_resulted]->(visit)
//merge (vlr)<-[:Ordered]-(provider)-[:Ordered]->(vlo)

关于neo4j - Neo.TransientError.General.OutOfMemoryError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45467625/

相关文章:

graph - Neo4j Cypher 查询 : Finding all nodes, 连接到一个节点,该节点具有超过 3 个其他关系

neo4j - 可选匹配上的密码合并

java - Spring如何处理事务回滚异常?

neo4j - 使用 Logback.xml Play 2.1 和 Neo4J WrappingNeoServer 错误

neo4j - Cypher - 匹配没有关系的节点时的奇怪行为

Neo4j 非法状态异常 : relationship X has been deleted (linked list query)

graph - Neo4j遍历时节点属性比较

python - 使用 Neo4j 的官方 Python 嵌入绑定(bind)创建节点时指定 id

merge - Cypher Neo4J - 带有 MERGE 的 CASE 表达式

neo4j - "return count(n) as count"当 count(n) = 0 时返回零行