java - 如何提高 Neo4j 2.0 cypher/ExecutionResult 在重负载下的性能?

标签 java multithreading cypher neo4j

背景: 我们注意到,随着并发线程数量的增加,从 ExecutionResult 检索数据时性能会下降。我们的生产应用程序有 200 个工作线程,在嵌入式模式下使用 Neo4j 2.0.0 Community。 例如以毫秒为单位。

  1. 线程:1,加密时间:0,提取时间:188
  2. 线程:10,加密时间:1,提取时间:188
  3. 线程:50,加密时间:1,提取时间:2481
  4. 线程:100,加密时间:1,提取时间:4466

程序输出示例(过滤 1 个线程的结果):

2013-12-23 14:39:31,137 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 1
2013-12-23 14:39:31,137 [main] INFO  net.ahm.graph.CypherLab  - >>>> STARTED GRAPHDB
2013-12-23 14:39:39,203 [main] INFO  net.ahm.graph.CypherLab  - >>>> CREATED NODES
2013-12-23 14:39:43,510 [main] INFO  net.ahm.graph.CypherLab  - >>>> WARMED UP
2013-12-23 14:39:43,510 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 0 m-secs
2013-12-23 14:39:43,698 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 188 m-secs
2013-12-23 14:39:43,698 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
2013-12-23 14:39:43,698 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!



2013-12-23 14:40:10,470 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 10
...
2013-12-23 14:40:23,985 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 1 m-secs
2013-12-23 14:40:25,219 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 188 m-secs
2013-12-23 14:40:25,219 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
2013-12-23 14:40:25,234 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!


2013-12-23 14:41:28,850 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 50
...
2013-12-23 14:41:41,781 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 1 m-secs
2013-12-23 14:41:45,720 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 2481 m-secs
2013-12-23 14:41:45,720 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
2013-12-23 14:41:46,855 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!


2013-12-23 14:44:09,267 [main] INFO  net.ahm.graph.CypherLab  - >>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: 100
...
2013-12-23 14:44:22,077 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER TOOK: 1 m-secs
2013-12-23 14:44:30,915 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> GETTING RESULTS TOOK: 4466 m-secs
2013-12-23 14:44:30,915 [pool-1-thread-1] INFO  net.ahm.graph.CypherLab  - >>>> CYPHER RETURNED ROWS: 50000
2013-12-23 14:44:31,680 [Thread-4] INFO  net.ahm.graph.CypherLab  - ### GRAPHDB SHUTDOWNHOOK INVOKED !!!

测试程序:

package net.ahm.graph;

import java.io.File;
import java.util.Map;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

import org.apache.log4j.Logger;
import org.neo4j.cypher.javacompat.ExecutionEngine;
import org.neo4j.cypher.javacompat.ExecutionResult;
import org.neo4j.graphdb.DynamicLabel;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.RelationshipType;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.neo4j.graphdb.factory.GraphDatabaseSettings;
import org.neo4j.graphdb.schema.IndexDefinition;
import org.neo4j.graphdb.schema.Schema;
import org.neo4j.kernel.impl.util.FileUtils;
import org.neo4j.kernel.impl.util.StringLogger;

public class CypherLab {
    private static final Logger LOG = Logger.getLogger(CypherLab.class);
    private final static int CONCURRENCY = 100;

    public static void main(String[] args) throws Exception {
        FileUtils.deleteRecursively(new File("graphdb"));
        final GraphDatabaseService graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder("graphdb")
                .setConfig(GraphDatabaseSettings.use_memory_mapped_buffers, "true").setConfig(GraphDatabaseSettings.cache_type, "strong")
                .newGraphDatabase();
        registerShutdownHook(graphDb);
        LOG.info(">>>>>>>>>>>>>>>>>>>>>>>>>>>>> NUMBER OF PARALLEL CYPHER EXECUTIONS: " + CONCURRENCY);
        LOG.info(">>>> STARTED GRAPHDB");
        createIndex("Parent", "name", graphDb);
        createIndex("Child", "name", graphDb);
        try (Transaction tx = graphDb.beginTx()) {
            Node parent = graphDb.createNode(DynamicLabel.label("Parent"));
            parent.setProperty("name", "parent");
            for (int i = 0; i < 50000; i++) {
                Node child = graphDb.createNode(DynamicLabel.label("Child"));
                child.setProperty("name", "child" + i);
                parent.createRelationshipTo(child, RelationshipTypes.PARENT_CHILD);
            }
            tx.success();
        }
        LOG.info(">>>> CREATED NODES");
        final ExecutionEngine engine = new ExecutionEngine(graphDb, StringLogger.SYSTEM);
        for (int i = 0; i < 10; i++) {
            try (Transaction tx = graphDb.beginTx()) {
                ExecutionResult result = engine.execute("match (n:Parent)-[:PARENT_CHILD]->(m:Child) return n.name, m.name");
                for (Map<String, Object> row : result) {
                    assert ((String) row.get("n.name") != null);
                    assert ((String) row.get("m.name") != null);
                }
                tx.success();
            }
        }
        LOG.info(">>>> WARMED UP");
        ExecutorService es = Executors.newFixedThreadPool(CONCURRENCY);
        final CountDownLatch cdl = new CountDownLatch(CONCURRENCY);
        for (int i = 0; i < CONCURRENCY; i++) {
            es.execute(new Runnable() {
                @Override
                public void run() {
                    try (Transaction tx = graphDb.beginTx()) {
                        long time = System.currentTimeMillis();
                        ExecutionResult result = engine.execute("match (n:Parent)-[:PARENT_CHILD]->(m:Child) return n.name, m.name");
                        LOG.info(">>>> CYPHER TOOK: " + (System.currentTimeMillis() - time) + " m-secs");
                        int count = 0;
                        time = System.currentTimeMillis();
                        for (Map<String, Object> row : result) {
                            assert ((String) row.get("n.name") != null);
                            assert ((String) row.get("m.name") != null);
                            count++;
                        }
                        LOG.info(">>>> GETTING RESULTS TOOK: " + (System.currentTimeMillis() - time) + " m-secs");
                        tx.success();
                        LOG.info(">>>> CYPHER RETURNED ROWS: " + count);
                    } catch (Throwable t) {
                        LOG.error(t);
                    } finally {
                        cdl.countDown();
                    }
                }
            });
        }
        cdl.await();
        es.shutdown();
    }

    private static void createIndex(String label, String propertyName, GraphDatabaseService graphDb) {
        IndexDefinition indexDefinition;
        try (Transaction tx = graphDb.beginTx()) {
            Schema schema = graphDb.schema();
            indexDefinition = schema.indexFor(DynamicLabel.label(label)).on(propertyName).create();
            tx.success();
        }
        try (Transaction tx = graphDb.beginTx()) {
            Schema schema = graphDb.schema();
            schema.awaitIndexOnline(indexDefinition, 10, TimeUnit.SECONDS);
            tx.success();
        }
    }

    private static void registerShutdownHook(final GraphDatabaseService graphDb) {
        Runtime.getRuntime().addShutdownHook(new Thread() {
            @Override
            public void run() {
                LOG.info("### GRAPHDB SHUTDOWNHOOK INVOKED !!!");
                graphDb.shutdown();
            }
        });
    }

    private enum RelationshipTypes implements RelationshipType {
        PARENT_CHILD
    }
}

最佳答案

当这个 commit is merged in 时应该会更好。将作为 2.0.1 的一部分发布 还有一些其他较小的瓶颈。

您可以尝试将您的网络服务器线程限制为核心数倍(或核心数 * 2)吗?看看这是否有帮助?

我的理解是,在预热并将热数据集放入缓存后,它仅受 CPU 限制,而不再受 I/O 限制以进行读取。因此,线程过多会导致 CPU 和工作人员挨饿。

如果我使用 8 个和 100 个内核运行测试,我会得到这些用于执行查询并获取 50k 结果的分布:

  • 8 个线程:50% 百分位为 500 毫秒,90% 为 650 毫秒
  • 100 个线程:2600 毫秒的 50% 百分位数和 6000 毫秒的 90%

代码和详细直方图:https://gist.github.com/jexp/a164f6cf9686b8125872

关于java - 如何提高 Neo4j 2.0 cypher/ExecutionResult 在重负载下的性能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20750741/

相关文章:

neo4j - 当属性名称是参数时如何查询属性值?

java - 使用 SQLiteOpenHelper 时进行 JOIN 查询的最佳方法是什么

java - 如何将PEM公钥转换为DER公钥?

java - Base64 Java 编码和解码字符串

ruby - 通过 ruby​​ 进程共享变量

java - Java 中具有并行化的链式过滤器

java - 我如何有效地处理来自 Executor Service 的多个结果

neo4j - 如何找到neo4j中两个节点之间的最短关系?

java - Spring Tool Suite - Pivotal tc Server Developer Edition v3.0 所需的8080端口已被占用

java - 具有两个集合的计算字段