c# - 使用 FileStream.Seek

标签 c# file search filestream seek

我正在尝试使用 FileStream。寻求快速跳转到一行并阅读它。

但是,我没有得到正确的结果。我试图看这个有一段时间了,但不明白我做错了什么。

环境:
操作系统:Windows 7
架构:.NET 4.0
IDE:Visual C# Express 2010

文件位置中的示例数据:C:\Temp\Temp.txt

0001|100!2500
0002|100!2500
0003|100!2500
0004|100!2500
0005|100!2500
0006|100!2500
0007|100!2500
0008|100!2500
0009|100!2500
0010|100!2500

The code:

class PaddedFileSearch
{
    private int LineLength { get; set; }
    private string FileName { get; set; }

    public PaddedFileSearch()
    {
        FileName = @"C:\Temp\Temp.txt";     // This is a padded file.  All lines are of the same length.

        FindLineLength();
        Debug.Print("File Line length: {0}", LineLength);

        // TODO: This purely for testing.  Move this code out.
        SeekMethod(new int[] { 5, 3, 4 });
        /*  Expected Results:
         *  Line No     Position        Line
         *  -------     --------        -----------------
         *  3           30              0003|100!2500
         *  4           15              0004|100!2500
         *  5           15              0005|100!2500 -- This was updated after the initial request.
         */

        /* THIS DOES NOT GIVE THE EXPECTED RESULTS */
        SeekMethod(new int[] { 5, 3 });
        /*  Expected Results:
         *  Line No     Position        Line
         *  -------     --------        -----------------
         *  3           30              0003|100!2500
         *  5           30              0005|100!2500
         */
    }

    private void FindLineLength()
    {
        string line;

        // Add check for FileExists

        using (StreamReader reader = new StreamReader(FileName))
        {
            if ((line = reader.ReadLine()) != null)
            {
                LineLength = line.Length + 2;
                // The 2 is for NewLine(\r\n)
            }
        }

    }

    public void SeekMethod(int[] lineNos)
    {
        long position = 0;
        string line = null;

        Array.Sort(lineNos);

        Debug.Print("");
        Debug.Print("Line No\t\tPosition\t\tLine");
        Debug.Print("-------\t\t--------\t\t-----------------");

        using (FileStream fs = new FileStream(FileName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNos)
                {
                    position = (lineNo - 1) * LineLength - position;
                    fs.Seek(position, SeekOrigin.Current);

                    if ((line = reader.ReadLine()) != null)
                    {
                        Debug.Print("{0}\t\t\t{1}\t\t\t\t{2}", lineNo, position, line);
                    }
                }
            }
        }
    }
}

我得到的输出:

File Line length: 15

Line No     Position        Line
-------     --------        -----------------
3           30              0003|100!2500
4           15              0004|100!2500
5           45              0005|100!2500

Line No     Position        Line
-------     --------        -----------------
3           30              0003|100!2500
5           30              0004|100!2500

My problem is with the following output:

Line No     Position        Line
-------     --------        -----------------
5           30              0004|100!2500

The output for Line should be: 0005|100!2500

I don't understand why this is happening.

Am I doing something wrong? Is there a workaround? Also are there faster ways to do this using something like seek?
(I am looking for code based options and NOT Oracle or SQL Server. For the sake of argument lets also say that the file size 1 GB.)

Any help is greatly appreciated.

Thanks.

UPDATE:
I found 4 great answers here. Thanks a lot.

Sample Timings:
Based on a few runs the following are the methods from best to good. Even the good is very close to best.
In a file that contains 10K lines, 2.28 MB. I searched for same 5000 random lines using all the options.

  1. Seek4: Time elapsed: 00:00:00.0398530 ms -- Ritch Melton
  2. Seek3: Time elapsed: 00:00:00.0446072 ms -- Valentin Kuzub
  3. Seek1: Time elapsed: 00:00:00.0538210 ms -- Jake
  4. Seek2: Time elapsed: 00:00:00.0889589 ms -- bitxwise

Shown below is the code. After saving the code you can simply call it by typing TestPaddedFileSeek.CallPaddedFileSeek();. You will also have to specify the namespace and the "using references".

`

/// <summary>
/// This class multiple options of reading a by line number in a padded file (all lines are the same length).
/// The idea is to quick jump to the file.
/// Details about the discussions is available at: http://stackoverflow.com/questions/5201414/having-a-problem-while-using-filestream-seek-in-c-solved
/// </summary>
class PaddedFileSeek
{
    public FileInfo File {get; private set;}
    public int LineLength { get; private set; }

    #region Private methods
    private static int FindLineLength(FileInfo fileInfo)
    {
        using (StreamReader reader = new StreamReader(fileInfo.FullName))
        {
            string line;
            if ((line = reader.ReadLine()) != null)
            {
                int length = line.Length + 2;   // The 2 is for NewLine(\r\n)
                return length;
            }
        }
        return 0;
    }

    private static void PrintHeader()
    {
       /*
        Debug.Print("");
        Debug.Print("Line No\t\tLine");
        Debug.Print("-------\t\t--------------------------");
       */ 
    }

    private static void PrintLine(int lineNo, string line)
    {
        //Debug.Print("{0}\t\t\t{1}", lineNo, line);
    }

    private static void PrintElapsedTime(TimeSpan elapsed)
    {
        Debug.WriteLine("Time elapsed: {0} ms", elapsed);
    }
    #endregion

    public PaddedFileSeek(FileInfo fileInfo)
    {
        // Possibly might have to check for FileExists
        int length = FindLineLength(fileInfo);
        //if (length == 0) throw new PaddedProgramException();
        LineLength = length;
        File = fileInfo;
    }

    public void CallAll(int[] lineNoArray, List<int> lineNoList)
    {
        Stopwatch sw = new Stopwatch();

        #region Seek1
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek1: ");
        // Print Header
        PrintHeader();

        Seek1(lineNoArray);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

        #region Seek2
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek2: ");
        // Print Header
        PrintHeader();

        Seek2(lineNoArray);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

        #region Seek3
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek3: ");
        // Print Header
        PrintHeader();

        Seek3(lineNoArray);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

        #region Seek4
        // Create new stopwatch
        sw.Start();

        Debug.Write("Seek4: ");

        // Print Header
        PrintHeader();

        Seek4(lineNoList);

        // Stop timing
        sw.Stop();

        // Print Elapsed Time
        PrintElapsedTime(sw.Elapsed);

        sw.Reset();
        #endregion

    }

    /// <summary>
    /// Option by Jake
    /// </summary>
    /// <param name="lineNoArray"></param>
    public void Seek1(int[] lineNoArray)
    {
        long position = 0;
        string line = null;

        Array.Sort(lineNoArray);

        using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNoArray)
                {
                    position = (lineNo - 1) * LineLength;
                    fs.Seek(position, SeekOrigin.Begin);

                    if ((line = reader.ReadLine()) != null)
                    {
                        PrintLine(lineNo, line);
                    }

                    reader.DiscardBufferedData();
                }
            }
        }

    }

    /// <summary>
    /// option by bitxwise
    /// </summary>
    public void Seek2(int[] lineNoArray)
    {
        string line = null;
        long step = 0;

        Array.Sort(lineNoArray);

        using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            // using (StreamReader reader = new StreamReader(fs))
            // If you put "using" here you will get WRONG results.
            // I would like to understand why this is.
            {
                foreach (int lineNo in lineNoArray)
                {
                    StreamReader reader = new StreamReader(fs);
                    step = (lineNo - 1) * LineLength - fs.Position;
                    fs.Position += step;

                    if ((line = reader.ReadLine()) != null)
                    {
                        PrintLine(lineNo, line);
                    }
                }
            }
        }
    }

    /// <summary>
    /// Option by Valentin Kuzub
    /// </summary>
    /// <param name="lineNoArray"></param>
    #region Seek3
    public void Seek3(int[] lineNoArray)
    {
        long position = 0; // totalPosition = 0;
        string line = null;
        int oldLineNo = 0;

        Array.Sort(lineNoArray);

        using (FileStream fs = new FileStream(File.FullName, FileMode.Open, FileAccess.Read, FileShare.None))
        {
            using (StreamReader reader = new StreamReader(fs))
            {
                foreach (int lineNo in lineNoArray)
                {
                    position = (lineNo - oldLineNo - 1) * LineLength;
                    fs.Seek(position, SeekOrigin.Current);
                    line = ReadLine(fs, LineLength);
                    PrintLine(lineNo, line);
                    oldLineNo = lineNo;

                }
            }
        }

    }

    #region Required Private methods
    /// <summary>
    /// Currently only used by Seek3
    /// </summary>
    /// <param name="stream"></param>
    /// <param name="length"></param>
    /// <returns></returns>
    private static string ReadLine(FileStream stream, int length)
    {
        byte[] bytes = new byte[length];
        stream.Read(bytes, 0, length);
        return new string(Encoding.UTF8.GetChars(bytes));
    }
    #endregion
    #endregion

    /// <summary>
    /// Option by Ritch Melton
    /// </summary>
    /// <param name="lineNoArray"></param>
    #region Seek4
    public void Seek4(List<int> lineNoList)
    {
        lineNoList.Sort();

        using (var fs = new FileStream(File.FullName, FileMode.Open))
        {
            lineNoList.ForEach(ln => OutputData(fs, ln));
        }

    }

    #region Required Private methods
    private void OutputData(FileStream fs, int lineNumber)
    {
        var offset = (lineNumber - 1) * LineLength;

        fs.Seek(offset, SeekOrigin.Begin);

        var data = new byte[LineLength];
        fs.Read(data, 0, LineLength);

        var text = DecodeData(data);
        PrintLine(lineNumber, text);
    }

    private static string DecodeData(byte[] data)
    {
        var encoding = new UTF8Encoding();
        return encoding.GetString(data);
    }

    #endregion

    #endregion
}



static class TestPaddedFileSeek
{
    public static void CallPaddedFileSeek()
    {
        const int arrayLenght = 5000;
        int[] lineNoArray = new int[arrayLenght];
        List<int> lineNoList = new List<int>();
        Random random = new Random();
        int lineNo;
        string fileName;


        fileName = @"C:\Temp\Temp.txt";

        PaddedFileSeek seeker = new PaddedFileSeek(new FileInfo(fileName));

        for (int n = 0; n < 25; n++)
        {
            Debug.Print("Loop no: {0}", n + 1);

            for (int i = 0; i < arrayLenght; i++)
            {
                lineNo = random.Next(1, arrayLenght);

                lineNoArray[i] = lineNo;
                lineNoList.Add(lineNo);
            }

            seeker.CallAll(lineNoArray, lineNoList);

            lineNoList.Clear();

            Debug.Print("");
        }
    }
}

`

最佳答案

我对您的预期位置感到困惑,第 5 行在位置 30 和 45,第 4 行在 15,而第 3 行在 30?

这是读取逻辑的核心:

    var offset = (lineNumber - 1) * LineLength;

    fs.Seek(offset, SeekOrigin.Begin);

    var data = new byte[LineLength];
    fs.Read(data, 0, LineLength);

    var text = DecodeData(data);
    Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text);

完整示例在这里:

class PaddedFileSearch
{
    public int LineLength { get; private set; }
    public FileInfo File { get; private set; }

    public PaddedFileSearch(FileInfo fileInfo)
    {
        var length = FindLineLength(fileInfo);
        //if (length == 0) throw new PaddedProgramException();
        LineLength = length;
        File = fileInfo;
    }

    private static int FindLineLength(FileInfo fileInfo)
    {
        using (var reader = new StreamReader(fileInfo.FullName))
        {
            string line;
            if ((line = reader.ReadLine()) != null)
            {
                var length = line.Length + 2;
                return length;
            }
        }

        return 0;
    }

    public void SeekMethod(List<int> lineNumbers)
    {

        Debug.Print("");
        Debug.Print("Line No\t\tPosition\t\tLine");
        Debug.Print("-------\t\t--------\t\t-----------------");

        lineNumbers.Sort();

        using (var fs = new FileStream(File.FullName, FileMode.Open))
        {
            lineNumbers.ForEach(ln => OutputData(fs, ln));
        }
    }

    private void OutputData(FileStream fs, int lineNumber)
    {
        var offset = (lineNumber - 1) * LineLength;

        fs.Seek(offset, SeekOrigin.Begin);

        var data = new byte[LineLength];
        fs.Read(data, 0, LineLength);

        var text = DecodeData(data);
        Debug.Print("{0,-12}{1,-16}{2}", lineNumber, offset, text);
    }

    private static string DecodeData(byte[] data)
    {
        var encoding = new UTF8Encoding();
        return encoding.GetString(data);
    }
}

class Program
{
    static void Main(string[] args)
    {
        var seeker = new PaddedFileSearch(new FileInfo(@"D:\Desktop\Test.txt"));

        Debug.Print("File Line length: {0}", seeker.LineLength);

        seeker.SeekMethod(new List<int> { 5, 3, 4 });
        seeker.SeekMethod(new List<int> { 5, 3 });
    }
}

关于c# - 使用 FileStream.Seek,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5201414/

相关文章:

C#:如何让 Hacker/Cracker 更难绕过或绕过许可检查?

python - 使用 dictConfig 进行日志记录并写入控制台和文件

php - 特殊搜索表单

c# - MongoDB 在连接字符串中指定数据库

c# - 无法通过https访问WCF服务

c# - 使用 LongListSelector 控件进行导航

java - 将文件夹复制到具有相同名称的目标

c - 打开/关闭文件时出现段错误?

c - 搜索输入字符串的最佳搜索算法

java - 在 HashSet 中搜索字符串数组中的任意元素