java - 将之前写入 HDFS 的 lucene 索引加载到 RamDirectory

这是错误消息:

Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in RAMDirectory@1cff1d4a lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@2ddf0c3: files: [/prod/hdfs/LUCENE/index/140601/_0.cfe, /prod/hdfs/LUCENE/index/140601/segments_2, /prod/hdfs/LUCENE/index/140601/_0.si, /prod/hdfs/LUCENE/index/140601/segments.gen, /prod/hdfs/LUCENE/index/140601/_0.cfs]
    at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:801)
    at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
    at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66)

我已正确提交并关闭了索引编写器。

这是搜索器代码:

public class SearchFiles {

private SearchFiles() {}

public static void main(String[] args) throws Exception  {

    String filenm = ""; 
    // Creating FileSystem object, to be able to work with HDFS
    Configuration config = new Configuration();
    config.set("fs.defaultFS","hdfs://127.0.0.1:9000/");
    config.addResource(new Path("/usr/local/Cellar/hadoop/2.4.0/libexec/etc/hadoop/core-site.xml"));
    FileSystem dfs = FileSystem.get(config);
    FileStatus[] status = dfs.listStatus(new Path("/prod/hdfs/LUCENE/index/140601"));

    // Creating a RAMDirectory (memory) object, to be able to create index in memory.
    RAMDirectory rdir = new RAMDirectory();

    // Getting the list of index files present in the directory into an array.
    FSDataInputStream filereader = null;

    for (int i=0;i<status.length;i++)
    {

    // Reading data from index files on HDFS directory into filereader object.
    filereader = dfs.open(status[i].getPath());
        int size = filereader.available();
        // Reading data from file into a byte array.            

        byte[] bytarr = new byte[size];
        filereader.read(bytarr, 0, size);

    // Creating file in RAM directory with names same as that of 
    //index files present in HDFS directory.
        filenm = new String (status[i].getPath().toString()) ; 
        String sSplitValue = filenm.substring(21,filenm.length());
        System.out.println( sSplitValue);

        IndexOutput indxout = rdir.createOutput((sSplitValue) , null);

        // Writing data from byte array to the file in RAM directory
        indxout.writeBytes(bytarr,bytarr.length);
        indxout.flush();        
        indxout.close();  
    }
    filereader.close();
//  IndexReader indexReader = IndexReader.open(rdir);

    IndexReader indexReader = DirectoryReader.open(rdir); 
    IndexSearcher searcher = new IndexSearcher(indexReader);
    Analyzer analyzer = new StandardAnalyzer (Version.LUCENE_47); 
    QueryParser parser = new QueryParser(Version.LUCENE_47, "FUNDG_SRCE_CD",analyzer); 
    Query query = parser.parse("D"); 
    TopDocs results = searcher.search(query,1000); 

    int numTotalHits = results.totalHits; 
    TopDocs topDocs = searcher.search(query,1000); 
    ScoreDoc[] hits = topDocs.scoreDocs; 

    //Printing the number of documents or entries that match the search query.
    System.out.println("Total Hits = "+ numTotalHits); 
    for (int j =0 ; j < hits.length ; j++) {
        int docId = hits[j].doc; 

        Document d = searcher.doc(docId);

    System.out.println(d.get("FUNDG_SRCE_CD") +" " + d.get("ACCT_NUM") ) ; 
}
}
}

最佳答案

我不认为您应该将 null 作为 IOContext 传入。 createOutput 的参数.尝试使用 IOContext.DEFAULT反而。真的不知道这是否会成功，但也许是朝着正确方向迈出的一步。

为什么不让它变得容易呢？您可以使用适当的 RAMDirectory复制索引的构造函数:

public static void main(String[] args) throws Exception  {
    Directory oldDirectory = FSDirectory("/prod/hdfs/LUCENE/index/140601");
    Directory rdir = new RAMDirectory(fsDirectory, IOContext.DEFAULT);
    IndexReader indexReader = DirectoryReader.open(rdir); 
    //etc.
}

关于java - 将之前写入 HDFS 的 lucene 索引加载到 RamDirectory，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24636212/

java - 将之前写入 HDFS 的 lucene 索引加载到 RamDirectory

上一篇：hadoop - Hadoop MapReduce:具有固定数量的输入文件？

下一篇：java - HBase Java Client批处理/放入CDH 4.6的速度很慢