c# - 基于Word存储Word行数和频率

标签 c# list dictionary frequency word

我正在解决一个问题,我必须能够读取一个文本文件,并计算特定单词的频率和行号。

例如,一个 txt 文件读取

"Hi my name is

Bob. This is 

Cool"

应该返回:

1 Hi 1

1 my 1

1 name 1

2 is 1 2

1 bob 2

1 this 2

1 cool 3

我无法决定如何存储行号以及词频。我尝试了一些不同的东西,到目前为止,这就是我所处的位置。

有什么帮助吗?

        Dictionary<string, int> countDictionary = new Dictionary<string,int>();
        Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();

        List<string> lines = new List<string>();


        System.IO.StreamReader file =
                new System.IO.StreamReader("Sample.txt");

        //Creates a List of lines
        string x;
        while ((x = file.ReadLine()) != null)
        {
            lines.Add(x);
        }

        foreach(var y in Enumerable.Range(0,lines.Count()))
        {
            foreach(var word in lines[y].Split())
            {
                if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
                {
                    countDictionary.Add(word.ToLower(), 1);
                    //lineDictionary.Add(word.ToLower(), /*what to put here*/);
                }
                else
                {
                    countDictionary[word] += 1;
                    //ADD line to dictionary???
                }
            }
        }



       foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both 
       {
           Console.WriteLine("{0}  {1}", pair.Value, pair.Key);
       }

        file.Close();


        System.Console.ReadLine();

最佳答案

你几乎可以用一行 linq 来做到这一点

var processed =
  //get the lines of text as IEnumerable<string> 
  File.ReadLines(@"myFilePath.txt")
    //get a word and a line number for every word
    //so you'll have a sequence of objects with 2 properties
    //word and lineNumber
    .SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
    //group these objects by their "word" property
    .GroupBy(x => x.word)
    //select what you need
    .Select(g => new{
        //number of objects in the group
        //i.e. the frequency of the word
        Count = g.Count(), 
        //the actual word
        Word = g.Key, 
        //a sequence of line numbers of each instance of the 
        //word in the group
        Positions = g.Select(x => x.lineNumber)});

foreach(var entry in processed)
{
    Console.WriteLine("{0} {1} {2}",
                      entry.Count,
                      entry.Word,
                      string.Join(" ",entry.Positions));
}

我喜欢从 0 开始计数,所以你可能想在适当的地方加 1。

关于c# - 基于Word存储Word行数和频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29974117/

相关文章:

c# - 使用 TreeInstance 将树添加到 Terrain C#

python - 从 python 字典创建边缘列表

python - 如何检查 **kwargs 中的键是否存在?

Python:排序这个列表

python - 如何将包含数学表达式的字符串拆分为列表?

java - 检查完全包含在其他字符串列表中的字符串列表

python - 在多处理中通过键访问字典中的值

c# - 如何在运行时指定 [DllImport] 路径?

c# - 监听业务逻辑的 PropertyChanged 事件

c# - 更快的算法来改变位图中的色相/饱和度/亮度