c# - string.split() "Out of memory exception"读取制表符分隔文件时

标签 c# out-of-memory

我在我的 C# 代码中使用 string.split() 来读取制表符分隔的文件。我正面临下面代码示例中提到的“OutOfMemory 异常”。

这里我想知道为什么文件大小为 16 MB 时会出现问题?

这是正确的方法吗?

using (StreamReader reader = new StreamReader(_path))
{
  //...........Load the first line of the file................
  string headerLine = reader.ReadLine();

  MeterDataIPValueList objMeterDataList = new MeterDataIPValueList();
  string[] seperator = new string[1];   //used to sepreate lines of file

  seperator[0] = "\r\n";
  //.............Load Records of file into string array and remove all empty lines of file.................
  string[] line = reader.ReadToEnd().Split(seperator, StringSplitOptions.RemoveEmptyEntries);
  int noOfLines = line.Count();
  if (noOfLines == 0)
  {
    mFileValidationErrors.Append(ConstMsgStrings.headerOnly + Environment.NewLine);
  }
  //...............If file contains records also with header line..............
  else
  {
    string[] headers = headerLine.Split('\t');
    int noOfColumns = headers.Count();

    //.........Create table structure.............
    objValidateRecordsTable.Columns.Add("SerialNo");
    objValidateRecordsTable.Columns.Add("SurveyDate");
    objValidateRecordsTable.Columns.Add("Interval");
    objValidateRecordsTable.Columns.Add("Status");
    objValidateRecordsTable.Columns.Add("Consumption");

    //........Fill objValidateRecordsTable table by string array contents ............

    int recordNumber;  // used for log
    #region ..............Fill objValidateRecordsTable.....................
    seperator[0] = "\t";
    for (int lineNo = 0; lineNo < noOfLines; lineNo++)
    {
      recordNumber = lineNo + 1;
      **string[] recordFields = line[lineNo].Split(seperator, StringSplitOptions.RemoveEmptyEntries);** // Showing me error when we  split columns
      if (recordFields.Count() == noOfColumns)
      {
        //Do processing
      }

最佳答案

Split 实现不佳,在应用于大型字符串时会出现严重的性能问题。请引用this article for details on memory requirements by split function :

What happens when you do a split on a string containing 1355049 comma separated strings of 16 characters each, having total character length of 25745930 ?

  1. An Array of pointers to string object: Contiguous virtual address space of 4 (address pointer)*1355049 = 5420196 (arrays size) + 16 (for book keeping) = 5420212.

  2. Non-contiguous virtual address space for 1355049 strings, each of 54 bytes. It does not mean all those 1.3 million strings would be scattered all across the heap, but they will not be allocated on LOH. GC will allocate them on bunches on Gen0 heap.

  3. Split.Function will create internal array of System.Int32[] of size 25745930, consuming (102983736 bytes) ~98MB of LOH, which is very expensive L.

关于c# - string.split() "Out of memory exception"读取制表符分隔文件时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1404435/

相关文章:

c# - 在 64 位机器上基于 AnyCPU 与 x64 平台构建的 C# 应用程序的性能

c# - WPF:无法在没有可怕视觉效果的情况下调整窗口大小

c# - 为什么 ReadToEnd 会抛出 OutOfMemory 异常而 ReadAllText 不会?

java - 从 SQLiteDatabase 显示数据时出现 OutOfMemory 错误

ruby - 无法在 Ruby 中分配内存(无 MemoryError)?

java - 抛出 OutOfMemoryError 异常

android - 使用 gradle 编译我的 android 应用程序时出现 OutOfMemoryError

c# - 在 C# 中优化 Linq

c# - 在 * 而不是数字中打印值

c# - 当你的 MVC/Web API 不是从上到下异步时会发生什么?