c# - 基于Word存储Word行数和频率

我正在解决一个问题，我必须能够读取一个文本文件，并计算特定单词的频率和行号。

例如，一个 txt 文件读取

"Hi my name is

Bob. This is 

Cool"

应该返回:

1 Hi 1

1 my 1

1 name 1

2 is 1 2

1 bob 2

1 this 2

1 cool 3

我无法决定如何存储行号以及词频。我尝试了一些不同的东西，到目前为止，这就是我所处的位置。

有什么帮助吗？

        Dictionary<string, int> countDictionary = new Dictionary<string,int>();
        Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();

        List<string> lines = new List<string>();


        System.IO.StreamReader file =
                new System.IO.StreamReader("Sample.txt");

        //Creates a List of lines
        string x;
        while ((x = file.ReadLine()) != null)
        {
            lines.Add(x);
        }

        foreach(var y in Enumerable.Range(0,lines.Count()))
        {
            foreach(var word in lines[y].Split())
            {
                if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
                {
                    countDictionary.Add(word.ToLower(), 1);
                    //lineDictionary.Add(word.ToLower(), /*what to put here*/);
                }
                else
                {
                    countDictionary[word] += 1;
                    //ADD line to dictionary???
                }
            }
        }



       foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both 
       {
           Console.WriteLine("{0}  {1}", pair.Value, pair.Key);
       }

        file.Close();


        System.Console.ReadLine();

最佳答案

你几乎可以用一行 linq 来做到这一点

var processed =
  //get the lines of text as IEnumerable<string> 
  File.ReadLines(@"myFilePath.txt")
    //get a word and a line number for every word
    //so you'll have a sequence of objects with 2 properties
    //word and lineNumber
    .SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
    //group these objects by their "word" property
    .GroupBy(x => x.word)
    //select what you need
    .Select(g => new{
        //number of objects in the group
        //i.e. the frequency of the word
        Count = g.Count(), 
        //the actual word
        Word = g.Key, 
        //a sequence of line numbers of each instance of the 
        //word in the group
        Positions = g.Select(x => x.lineNumber)});

foreach(var entry in processed)
{
    Console.WriteLine("{0} {1} {2}",
                      entry.Count,
                      entry.Word,
                      string.Join(" ",entry.Positions));
}

我喜欢从 0 开始计数，所以你可能想在适当的地方加 1。

关于c# - 基于Word存储Word行数和频率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29974117/

c# - 基于Word存储Word行数和频率

上一篇：c# - 将新表单置于父表单内居中并阻止，直到输入并收到。

下一篇：c# - 如何使用 C# 将 html 文本转换为 utf-8