我正在尝试使用 Load CSV 从 CSV 加载大约 100 万行。我在 Windows 计算机上使用 Neo4j Enterprise 3.2.2。我已将头堆增加到 7g,但仍处于
Neo.TransientError.General.OutOfMemoryError
对于如何使用当前的密码查询加载此 CSV 有什么建议吗?
查询:
using periodic commit 200 load csv with headers from "file:///LabsTab.Txt" as csvLine fieldterminator '\t' with csvLine where csvLine.ObservationName <> "Cancellation Reason"
optional match (visit:Visit {VisitID: csvLine.VisitID})
merge (provider:Provider {ProviderName: csvLine.ProviderName}) on create set provider.ProviderID = csvLine.OrderingProviderID
merge (vlo:VisitLabOrder) on create set vlo.ProviderID = csvLine.OrderingProviderID on create set vlo.FillerOrderNo = csvLine.FillerOrderNo on create set vlo.OrderStartDtTm = apoc.date.parse(csvLine.OrderStartDtTm, "s", "yyyy/mm/dd hh:mm")
on create set vlo.OrderStart = csvLine.OrderStartDtTm
merge(lab:Lab{FillerOrderNo: csvLine.FillerOrderNo, OrderingProviderID: csvLine.OrderingProviderID, OrderingProvider: csvLine.ProviderName})
on create set lab.SpecimenCollectionDtTm = apoc.date.parse(csvLine.SpecimenCollectionDtTm, "s", "yyyy/mm/dd hh:mm")
on create set lab.SpecimentReceivedDtTm = apoc.date.parse(csvLine.SpecimenReceivedDtTm, "s", "yyyy/mm/dd hh:mm")
on create set lab.AnalysisDtTm= apoc.date.parse(csvLine.AnalysisDtTm, "s", "yyyy/mm/dd hh:mm")
merge(vlr:VisitLabResult{FillerOrderNo: csvLine.FillerOrderNo, ProviderID: csvLine.ProviderID}) on create set vlr.ResultStatusChangeDtTm = apoc.date.parse(csvLine.ResultStatusChangeDtTm, "s", "yyyy/mm/dd hh:mm")
on create set vlr.ResultStatusChange = csvLine.ResultStatusChangeDtTm
merge (labobs:LabObservation{UniversalServiceName: csvLine.UniversalServiceName, UniversalServiceID: csvLine.UniversalServiceID, ObservationName: csvLine.ObservationName, ObservationValue: csvLine.ObservationValue, Units: csvLine.Units})
//Merge (visit)-[r:Lab_tested]->(vlo)-[:Lab_tested]->(lab)-[:Observation_result]->(labobs)
//merge (lab)-[:Lab_resulted]->(vlr)-[:Lab_resulted]->(visit)
//merge (vlr)<-[:Ordered]-(provider)-[:Ordered]->(vlo)
最佳答案
您应该将LOAD CSV
与USING PERIODIC COMMIT
结合使用。
来自the docs :
If the CSV file contains a significant number of rows (approaching hundreds of thousands or millions),
USING PERIODIC COMMIT
can be used to instruct Neo4j to perform a commit after a number of rows. This reduces the memory overhead of the transaction state. By default, the commit will happen every 1000 rows.
您可以在 USING PERIODIC COMMIT
之后更改指定所需数量的默认行为,例如:
USING PERIODIC COMMIT 500
LOAD CSV FROM 'https://neo4j.com/docs/developer-manual/3.2/csv/artists.csv' AS line
CREATE (:Artist { name: line[1], year: toInt(line[2])})
此外,ON CREATE SET
可以通过MERGE
指定一次。每个分配可以用 ,
分隔。我不知道这些更改是否会产生影响,但请尝试:)
using periodic commit 200 load csv with headers from "file:///LabsTab.Txt" as csvLine fieldterminator '\t'
with csvLine where csvLine.ObservationName <> "Cancellation Reason"
optional match (visit:Visit {VisitID: csvLine.VisitID})
merge (provider:Provider {ProviderName: csvLine.ProviderName})
on create set provider.ProviderID = csvLine.OrderingProviderID
merge (vlo:VisitLabOrder)
on create set vlo.ProviderID = csvLine.OrderingProviderID,
vlo.FillerOrderNo = csvLine.FillerOrderNo,
vlo.OrderStartDtTm = apoc.date.parse(csvLine.OrderStartDtTm, "s", "yyyy/mm/dd hh:mm"),
vlo.OrderStart = csvLine.OrderStartDtTm
merge(lab:Lab{FillerOrderNo: csvLine.FillerOrderNo, OrderingProviderID: csvLine.OrderingProviderID, OrderingProvider: csvLine.ProviderName})
on create set lab.SpecimenCollectionDtTm = apoc.date.parse(csvLine.SpecimenCollectionDtTm, "s", "yyyy/mm/dd hh:mm"),
lab.SpecimentReceivedDtTm = apoc.date.parse(csvLine.SpecimenReceivedDtTm, "s", "yyyy/mm/dd hh:mm"),
lab.AnalysisDtTm= apoc.date.parse(csvLine.AnalysisDtTm, "s", "yyyy/mm/dd hh:mm")
merge(vlr:VisitLabResult{FillerOrderNo: csvLine.FillerOrderNo, ProviderID: csvLine.ProviderID})
on create set vlr.ResultStatusChangeDtTm = apoc.date.parse(csvLine.ResultStatusChangeDtTm, "s", "yyyy/mm/dd hh:mm"),
vlr.ResultStatusChange = csvLine.ResultStatusChangeDtTm
merge (labobs:LabObservation{UniversalServiceName: csvLine.UniversalServiceName, UniversalServiceID: csvLine.UniversalServiceID, ObservationName: csvLine.ObservationName, ObservationValue: csvLine.ObservationValue, Units: csvLine.Units})
//Merge (visit)-[r:Lab_tested]->(vlo)-[:Lab_tested]->(lab)-[:Observation_result]->(labobs)
//merge (lab)-[:Lab_resulted]->(vlr)-[:Lab_resulted]->(visit)
//merge (vlr)<-[:Ordered]-(provider)-[:Ordered]->(vlo)
关于neo4j - Neo.TransientError.General.OutOfMemoryError,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45467625/