c# - 为什么 C# 内存流要预留这么多内存？

我们的软件通过 GZipStream 解压缩某些字节数据，该 GZipStream 从 MemoryStream 读取数据。这些数据以 4KB 的 block 为单位解压缩并写入另一个 MemoryStream。

我们已经意识到进程分配的内存远高于实际解压后的数据。

示例: 具有 2,425,536 字节的压缩字节数组被解压缩为 23,050,718 字节。我们使用的内存分析器显示方法 MemoryStream.set_Capacity(Int32 value) 分配了 67,104,936 字节。这是保留内存和实际写入内存之间的 2.9 倍。

注意:MemoryStream.set_Capacity 是从 MemoryStream.EnsureCapacity 调用的，而 MemoryStream.EnsureCapacity 本身又是从我们函数中的 MemoryStream.Write 调用的。

为什么 MemoryStream 保留这么多容量，即使它只追加 4KB 的 block ？

这是解压数据的代码片段:

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream())
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

注意:如果相关，这是系统配置:

Windows XP 32 位，
.NET 3.5
使用 Visual Studio 2008 编译

最佳答案

因为this is the algorithm了解它如何扩展其容量。

public override void Write(byte[] buffer, int offset, int count) {

    //... Removed Error checking for example

    int i = _position + count;
    // Check for overflow
    if (i < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));

    if (i > _length) {
        bool mustZero = _position > _length;
        if (i > _capacity) {
            bool allocatedNewArray = EnsureCapacity(i);
            if (allocatedNewArray)
                mustZero = false;
        }
        if (mustZero)
            Array.Clear(_buffer, _length, i - _length);
        _length = i;
    }

    //... 
}

private bool EnsureCapacity(int value) {
    // Check for overflow
    if (value < 0)
        throw new IOException(Environment.GetResourceString("IO.IO_StreamTooLong"));
    if (value > _capacity) {
        int newCapacity = value;
        if (newCapacity < 256)
            newCapacity = 256;
        if (newCapacity < _capacity * 2)
            newCapacity = _capacity * 2;
        Capacity = newCapacity;
        return true;
    }
    return false;
}

public virtual int Capacity 
{
    //...

    set {
         //...

        // MemoryStream has this invariant: _origin > 0 => !expandable (see ctors)
        if (_expandable && value != _capacity) {
            if (value > 0) {
                byte[] newBuffer = new byte[value];
                if (_length > 0) Buffer.InternalBlockCopy(_buffer, 0, newBuffer, 0, _length);
                _buffer = newBuffer;
            }
            else {
                _buffer = null;
            }
            _capacity = value;
        }
    }
}

因此，每次达到容量限制时，容量都会增加一倍。这样做的原因是 Buffer.InternalBlockCopy 操作对于大型数组来说很慢，因此如果必须频繁调整每个 Write 调用的大小，性能会显着下降。

您可以做一些事情来提高性能，您可以将初始容量设置为至少是压缩数组的大小，然后您可以将大小增加小于 2.0 以减少您正在使用的内存量。

const double ResizeFactor = 1.25;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * ResizeFactor)) //Set the initial size to be the same as the compressed size + 25%.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
               resultStream.Capacity = resultStream.Capacity * ResizeFactor; //Resize to 125% instead of 200%

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

如果你愿意，你可以做更多花哨的算法，比如根据当前压缩比调整大小

const double MinResizeFactor = 1.05;

private byte[] Decompress(byte[] data)
{
    using (MemoryStream compressedStream = new MemoryStream(data))
    using (GZipStream zipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
    using (MemoryStream resultStream = new MemoryStream(data.Length * MinResizeFactor)) //Set the initial size to be the same as the compressed size + the minimum resize factor.
    {
        byte[] buffer = new byte[4096];
        int iCount = 0;

        while ((iCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            if(resultStream.Capacity < resultStream.Length + iCount)
            {
               double sizeRatio = ((double)resultStream.Position + iCount) / (compressedStream.Position + 1); //The +1 is to prevent divide by 0 errors, it may not be necessary in practice.

               //Resize to minimum resize factor of the current capacity or the 
               // compressed stream length times the compression ratio + min resize 
               // factor, whichever is larger.
               resultStream.Capacity =  Math.Max(resultStream.Capacity * MinResizeFactor, 
                                                 (sizeRatio + (MinResizeFactor - 1)) * compressedStream.Length);
             }

            resultStream.Write(buffer, 0, iCount);
        }
        return resultStream.ToArray();
    }
}

关于c# - 为什么 C# 内存流要预留这么多内存？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/24636259/

c# - 为什么 C# 内存流要预留这么多内存？

上一篇：c# - 计算十亿元素列表中唯一元素的最快方法是什么？

下一篇：windows - 对 32 位 Windows 可执行文件使用/LARGEADDRESSAWARE 的缺点？