python - txt 文件的字数统计并输出到文件

我想计算每个单词在文本文件中出现的次数，但不确定出了什么问题。我也很难找到一种方法来将单词不大写的出现次数包含在计数中

脚本需要两个命令行参数:输入的名称文件和阈值(整数)
输入文件每行仅包含一个单词，没有空格在单词之前或之后。该脚本不需要验证输入文件的内容。

输入文件中单词的字母大小写对于数数。例如，脚本应计算“the”、“The”和 “THE”是同一个词。

计算单词数后，脚本打印一份报告(到文件中， output.txt)列出了单词及其计数。每个词都是仅当其计数大于或等于阈值时才打印在命令行上给出。

这是我的代码:

file = open(r"E:\number.txt", "r", encoding="utf-8-sig")

from collections import Counter
word_counter = Counter(file.read().split())

for item in word_counter.items():
    print("{}\t{}".format(*item))

file.close()

但我希望按以下方式输出:

最佳答案

或者用 Pandas

import pandas as pd                                #Import Pandas
text1= pd.read_csv("E:\number.txt", header=None)   #Read text file    
s=pd.Series(text1[0]).str.lower()                  #convert to lowercase series
frequency_list = s.value_counts()                  #get frequencies of unique values

关于python - txt 文件的字数统计并输出到文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52787872/

上一篇：python - 为什么 'python'和 `py`指向不同的用户站点

下一篇：python - “方法包装器”对象在 AES CTR pycrypto 库中不可迭代

相关文章：

python - 如何检查DataFrame的列是否包含float类型？

python - 如何在 sqlalchemy 中将变量作为列名传递？

python - 对 pandas 数据透视中时间序列中某一天的所有值求和

python - 使用索引对 pandas 数据帧中的列进行子集化

python-3.x - Coverage 似乎加载了coveragerc 文件，但所有配置仍保持默认

python - 格式化长 python 行

Python 3 [类型错误 : 'str' object cannot be interpreted as an integer] when working with sockets

linux - 不同操作系统上的 tensorflow 训练和测试

python - 导入一个类的静态方法而不导入整个类

python-3.x - SSL 导致 Python 代码执行瓶颈 - 如何优化？