hadoop - 在 hadoop-gremlin 中使用 OneTimeBulkLoader 的 janusgraph 引发 "Graph does not support adding vertices"

标签 hadoop graph gremlin vertices janusgraph

我的目标: 使用SparkGraphComputer将本地数据bulkLoader到janusgraph,然后在hbase和ES上构建混合索引

我的问题:

Caused by: java.lang.UnsupportedOperationException: Graph does not support adding vertices
    at org.apache.tinkerpop.gremlin.structure.Graph$Exceptions.vertexAdditionsNotSupported(Graph.java:1133)
    at org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph.addVertex(HadoopGraph.java:187)
    at org.apache.tinkerpop.gremlin.process.traversal.step.map.AddVertexStartStep.processNextStart(AddVertexStartStep.java:91)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:128)
    at org.apache.tinkerpop.gremlin.process.traversal.step.util.AbstractStep.next(AbstractStep.java:38)
    at org.apache.tinkerpop.gremlin.process.traversal.util.DefaultTraversal.next(DefaultTraversal.java:200)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.OneTimeBulkLoader.getOrCreateVertex(OneTimeBulkLoader.java:49)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.executeInternal(BulkLoaderVertexProgram.java:210)
    at org.apache.tinkerpop.gremlin.process.computer.bulkloading.BulkLoaderVertexProgram.execute(BulkLoaderVertexProgram.java:197)
    at org.apache.tinkerpop.gremlin.spark.process.computer.SparkExecutor.lambda$null$4(SparkExecutor.java:118)
    at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$3.next(IteratorUtils.java:247)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
    at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:191)
    at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    ... 3 more

依赖:

janusgraph-all-0.3.1 janusgraph-es-0.3.1 hadoop-gremlin-3.3.3

配置如下:

  1. janusgraph-hbase-es.properties

    storage.backend=hbase
    gremlin.graph=XXX.XXX.XXX.gremlin.hadoop.structure.HadoopGraph
    storage.hostname=<ip>
    storage.hbase.table=hadoop-test-3
    storage.batch-loading=true
    schema.default = none
    cache.db-cache = true
    cache.db-cache-clean-wait = 20
    cache.db-cache-time = 180000
    cache.db-cache-size = 0.5
    index.search.backend=elasticsearch
    index.search.hostname=<ip>
    index.search.index-name=hadoop_test_3
    
  2. hadoop-graphson.properties

    gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
    gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONInputFormat
    gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONOutputFormat
    gremlin.hadoop.inputLocation=data/tinkerpop-modern.json
    gremlin.hadoop.outputLocation=output
    gremlin.hadoop.jarsInDistributedCache=true
    
    giraph.minWorkers=2
    giraph.maxWorkers=2
    giraph.useOutOfCoreGraph=true
    giraph.useOutOfCoreMessages=true
    mapred.map.child.java.opts=-Xmx1024m
    mapred.reduce.child.java.opts=-Xmx1024m
    giraph.numInputThreads=4
    giraph.numComputeThreads=4
    giraph.maxMessagesInMemory=100000
    
    spark.master=local[*]
    spark.serializer=org.apache.spark.serializer.KryoSerializer
    
  3. schema.groovy

    def defineGratefulDeadSchema(janusGraph) {
        JanusGraphManagement m = janusGraph.openManagement()
        VertexLabel person = m.makeVertexLabel("person").make()
        //使用IncrementBulkLoader导入时,去掉下面注释         
        //blid=m.makePropertyKey("bulkLoader.vertex.id")
          .dataType(Long.class).make()
        PropertyKey birth = 
          m.makePropertyKey("birth").dataType(Date.class).make()
        PropertyKey age = 
          m.makePropertyKey("age").dataType(Integer.class).make()
        PropertyKey name = 
          m.makePropertyKey("name").dataType(String.class).make()
        //index 
        //JanusGraphIndex index = m
          .buildIndex("nameCompositeIndex", 
          Vertex.class).addKey(name).unique().buildCompositeIndex()
        JanusGraphIndex index = m.buildIndex("mixedIndex", 
          Vertex.class).addKey(name).buildMixedIndex("search")
          //不支持唯一性检查,search为index.search.backend中的search
        //使用IncrementBulkLoader导入时,去掉下面注释
        //bidIndex = m.buildIndex("byBulkLoaderVertexId",     
          Vertex.class).addKey(blid).indexOnly(person)
          .buildCompositeIndex()
        m.commit()
    }
    
  4. 相关代码

    JanusGraph janusGraph = JanusGraphFactory.open
      ("config/janusgraph-hbase-es.properties");
    JanusgraphSchema janusgraphSchema = new JanusgraphSchema();
    janusgraphSchema.defineGratefulDeadSchema(janusGraph);
    janusGraph.close();
    
    Graph graph = GraphFactory.open("config/hadoop-
      graphson.properties");
    BulkLoaderVertexProgram blvp = BulkLoaderVertexProgram.
      build().bulkLoader(OneTimeBulkLoader.class).
      writeGraph("config/janusgraph-hbase-es.properties").
      create(graph);
    graph.compute(SparkGraphComputer.class).program(blvp).
      submit().get();
    graph.close();
    
    JanusGraph janusGraph1 = JanusGraphFactory.open
      ("config/janusgraph-hbase-es.properties");
    List<Map<String, Object>> list = janusGraph1.traversal().V().
      valueMap().toList();
    System.out.println("size: " + list.size());
    janusGraph1.close();
    

结果:

data success to import hbase, but fail to build index in ES

最佳答案

在我用默认值gremlin.graph=org.janusgraph.core.JanusGraphFactory重置gremlin.graph后,上面的错误没有出现。

关于hadoop - 在 hadoop-gremlin 中使用 OneTimeBulkLoader 的 janusgraph 引发 "Graph does not support adding vertices",我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55990565/

相关文章:

java - java中的树来存储文本中的单词

javascript - 当图表准备好/渲染时如何隐藏 cytoscape 节点?

c - 在 C 中创建图形的替代方法

shell - Amazon EMR:如何在参数中添加带有嵌入式shell脚本的Amazon EMR MapReduce/Hive/Spark步骤?

hadoop - 使用 DBeaver 连接到 Hive 数据库

java - 需要删除 HDFS 中的part-m-0000* 文件

sql - Concat,然后在Hive中分组

gremlin - Tinkerpop Gremlin 深度优先搜索顺序

groovy - 如何在 gremlin 中定义返回与 gremlin shell 中相同结果的函数?

java - 无法理解在 neo4j 上安装 gremlim 插件的说明