c# - 如何改进我的算法以将数据存储在硬盘上？

我想处理太多的文本数据，然后将其以 zip 存档的形式保存到硬盘。由于处理应以多线程方式进行，因此任务变得复杂。

...
ZipSaver saver = new ZipSaver(10000); // 10000 - is the number of items when necessary to save the file to hard drive
Parallel.ForEach(source, item => {
    string workResult = ModifyItem(item);
    saver.AddItem(workResult);
});

ZipSaver 类的一部分(使用 Ionic ZipFile 库)

private ConcurrentQueue<ZipFile> _pool;
public void AddItem(string src){
    ZipFile currentZipFile;
    if(_pool.TryDequeue(out currentZipFile) == false){
        currentZipFile = InitNewZipFile(); // 
    }
    currentZipFile.AddEntry(path, src); // f the pool is not available archives, create a new one
    // if after an item is added to the archive, you have reached the maximum number of elements,
    // specified in the constructor, save this file to your hard drive,
    // else return the archive into a common pool
    if(currentZipFile.Enties.Count > _maxEntries){
        SaveZip(currentZipFile);
    }else{
        _pool.Enqueue(currentZipFile);
    }
}

当然，我可以使用存档中的最大项目数，但这取决于输出文件的大小，理想情况下，应该进行配置。现在很多项目的集合，在循环中处理，创建许多线程，实用的，每个线程都有其“自己的”实例 ZipFile，导致 RAM 溢出。如何完善保护机制？对不起我的英语=)

最佳答案

如何限制并发线程的数量，这将限制您在队列中拥有的 ZipFile 实例的数量。例如:

Parallel.ForEach(source, 
    new ParallelOptions { MaxDegreeOfParallelism = 3 },
    item => 
    {
        string workResult = ModifyItem(item);
        saver.AddItem(workResult);
    });

也可能是 10,000 个项目太多了。如果您要添加的文件每个都是 1 兆字节，那么其中 10,000 个将创建一个 10 GB 的文件。这可能会使您耗尽内存。

您需要按大小而不是文件数量来限制 zip 文件。我不知道 DotNetZip 是否会让您看到输出缓冲区中当前有多少字节。如果不出意外，您可以估计您的压缩率并通过计算未压缩的字节数来使用它来限制大小。也就是说，如果您期望 50% 的压缩率并且希望将输出文件大小限制为 1 GB，那么您需要将总输入限制为 2 GB(即 1 gb/0.5 = 2 GB).

如果您能看到当前的输出大小，那将是最好的。我不熟悉 DotNetZip，所以我不能说它是否具有该功能。

关于c# - 如何改进我的算法以将数据存储在硬盘上？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22179443/

c# - 如何改进我的算法以将数据存储在硬盘上？

上一篇：确定对应用程序的兴趣的算法？

下一篇：python - 查找像 Twitter 这样的关注关系的算法