java - 提取大型 zip 文件的内容

标签 java truezip

我正在尝试提取大小约为 500MB 的 zip 文件的内容,其中包含大约 250K 个文件。

这就是我正在尝试做的 -

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;

import de.schlichtherle.truezip.file.TFile;
import de.schlichtherle.truezip.file.TFileInputStream;

public class ArchiveReaderExecutor {

    private final ExecutorService pool;

    public ArchiveReaderExecutor() {
        pool = Executors.newFixedThreadPool(8);
    }

    /**
     * Splits the archive file into list of lists as provided in the batch size
     * variable
     * 
     * @param archive
     * 
     * @return 
     */
    public List<List<TFile>> splitArchiveFile(final File archive) {
        final TFile tFile = new TFile(archive.getAbsolutePath());
        final ArrayList<TFile> individualFiles = new ArrayList<TFile>();
        recursivelyReadLeafnodes(tFile, individualFiles);
        final List<List<TFile>> returnList = new ArrayList<List<TFile>>();

        /*
         * Splitting the entire list into list of objects for batch processing
         */
        int count = 0;
        List<TFile> innerList = null;

        for (TFile splitFile : individualFiles) {
            if (count == 0) {
                innerList = new ArrayList<TFile>();
                returnList.add(innerList);
            }

            if (count < 100) {
                ++count;
            } else {
                count = 0;
            }
            innerList.add(splitFile);
        }
        return returnList;
    }

    public List<TFile> recursivelyReadLeafnodes(TFile inputTFile,
            ArrayList<TFile> individualFiles) {
        TFile[] tfiles = null;

        if (inputTFile.isArchive() || inputTFile.isDirectory()) {
            tfiles = inputTFile.listFiles();
        } else {
            tfiles = new TFile[0];
            tfiles[0] = inputTFile;
        }

        for (final TFile tFile : tfiles) {
            if (tFile.isFile() && !tFile.getName().startsWith(".")) {
                individualFiles.add(tFile);
            } else if (tFile.isDirectory()) {
                recursivelyReadLeafnodes(tFile, individualFiles);
            }
        }

        return individualFiles;
    }

    public void runExtraction() {

        File src = new File("Really_Big_File.zip");
        List<List<TFile>> files = splitArchiveFile(src);
        for (List<TFile> list : files) {
            pool.execute(new FileExtractorSavor(list));
        }
        pool.shutdown();

    }


    class FileExtractorSavor implements Runnable{
        List<TFile> files;
        public FileExtractorSavor(List<TFile> files) {
            this.files = files;
        }
        @Override
        public void run() {
            File file = null;
            TFileInputStream in = null;
            for (TFile tFile : files) {
                try {
                    in = new TFileInputStream(tFile);
                    file = new File("Target_Location"+tFile.getName());
                    FileUtils.writeStringToFile(file, IOUtils.toString(in));
                } catch (IOException e) {
                    e.printStackTrace();
                } finally {
                    IOUtils.closeQuietly(in);
                }
            }

        }

    }

    public static void main(String[] args) {
        new ArchiveReaderExecutor().runExtraction();
    }
}

当我同时运行此代码时,有很多线程处于等待/阻塞状态,这是线程转储:

"pool-1-thread-7" prio=5 tid=7fd8093dd000 nid=0x11d3f3000 waiting for monitor entry [11d3f2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at de.schlichtherle.truezip.socket.ConcurrentInputShop$SynchronizedConcurrentInputStream.close(ConcurrentInputShop.java:223)
    - waiting to lock <785460200> (a de.schlichtherle.truezip.fs.archive.FsDefaultArchiveController$Input)
    at de.schlichtherle.truezip.io.DecoratingInputStream.close(DecoratingInputStream.java:79)
    at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:178)
    at ArchiveReaderExecutor$FileExtractorSavor.run(ArchiveReaderExecutor.java:136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:695)

   Locked ownable synchronizers:
    - <79ed370e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
"pool-1-thread-5" prio=5 tid=7fd8093db800 nid=0x11d1ed000 waiting for monitor entry [11d1ec000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at de.schlichtherle.truezip.socket.ConcurrentInputShop$SynchronizedConcurrentInputStream.close(ConcurrentInputShop.java:223)
    - waiting to lock <785460200> (a de.schlichtherle.truezip.fs.archive.FsDefaultArchiveController$Input)
    at de.schlichtherle.truezip.io.DecoratingInputStream.close(DecoratingInputStream.java:79)
    at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:178)
    at ArchiveReaderExecutor$FileExtractorSavor.run(ArchiveReaderExecutor.java:136)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
    at java.lang.Thread.run(Thread.java:695)

   Locked ownable synchronizers:
    - <79ed46468> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

我也尝试过使用:

TFile.cp_r(src, dst, TArchiveDetector.NULL, TArchiveDetector.NULL);

由于它在单个线程上运行,因此花费了更长的时间。

我的问题是,使用 TrueZip 在 java 中提取 zip 文件内容的快速、最佳和最佳方法是什么?

最佳答案

这里没有什么问题。 TrueZIP/TrueVFS 为每个已安装的存档文件维护一个文件描述符。当多个线程同时读取存档文件的内容时,TrueZIP/TrueVFS 内核会序列化所有访问,以便只有一个线程正在使用文件描述符并随时更新其位置。同时所有其他线程将被阻塞。

关于java - 提取大型 zip 文件的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23620318/

相关文章:

java - 使用 Truezip 将新文件添加到 zip 存档

java - 调用 cp_rp 方法时抛出 TrueZip 异常

java - 如何在 "D:\"中创建 TrueZip 存档?

java - 如何将任意对象作为参数传递给 jasper 报告?

java - 如何从 BindingResult 获取 Controller 中的错误文本

java - 在 Helper 类中使用 ApplicationContext

java - 使用 TrueZip 将 .tar.gz 文件转换为 .zip?

java - Maven故障安全不执行测试

java - 调用executeUpdate()后超时

java - 在 Java 和 Windows 中打开的文件无法访问,即使 Java 不再使用文件