我正在解决一个问题,我必须能够读取一个文本文件,并计算特定单词的频率和行号。
例如,一个 txt 文件读取
"Hi my name is
Bob. This is
Cool"
应该返回:
1 Hi 1
1 my 1
1 name 1
2 is 1 2
1 bob 2
1 this 2
1 cool 3
我无法决定如何存储行号以及词频。我尝试了一些不同的东西,到目前为止,这就是我所处的位置。
有什么帮助吗?
Dictionary<string, int> countDictionary = new Dictionary<string,int>();
Dictionary<string, List<int>> lineDictionary = new Dictionary<string, List<int>>();
List<string> lines = new List<string>();
System.IO.StreamReader file =
new System.IO.StreamReader("Sample.txt");
//Creates a List of lines
string x;
while ((x = file.ReadLine()) != null)
{
lines.Add(x);
}
foreach(var y in Enumerable.Range(0,lines.Count()))
{
foreach(var word in lines[y].Split())
{
if(!countDictionary.Keys.Contains(word.ToLower()) && !lineDictionary.Keys.Contains(word.ToLower()))
{
countDictionary.Add(word.ToLower(), 1);
//lineDictionary.Add(word.ToLower(), /*what to put here*/);
}
else
{
countDictionary[word] += 1;
//ADD line to dictionary???
}
}
}
foreach (var pair in countDictionary)//WHAT TO PUT HERE to print both
{
Console.WriteLine("{0} {1}", pair.Value, pair.Key);
}
file.Close();
System.Console.ReadLine();
最佳答案
你几乎可以用一行 linq 来做到这一点
var processed =
//get the lines of text as IEnumerable<string>
File.ReadLines(@"myFilePath.txt")
//get a word and a line number for every word
//so you'll have a sequence of objects with 2 properties
//word and lineNumber
.SelectMany((line, lineNumber) => line.Split().Select(word => new{word, lineNumber}))
//group these objects by their "word" property
.GroupBy(x => x.word)
//select what you need
.Select(g => new{
//number of objects in the group
//i.e. the frequency of the word
Count = g.Count(),
//the actual word
Word = g.Key,
//a sequence of line numbers of each instance of the
//word in the group
Positions = g.Select(x => x.lineNumber)});
foreach(var entry in processed)
{
Console.WriteLine("{0} {1} {2}",
entry.Count,
entry.Word,
string.Join(" ",entry.Positions));
}
我喜欢从 0 开始计数,所以你可能想在适当的地方加 1。
关于c# - 基于Word存储Word行数和频率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29974117/