我试图在读取文件后平均每个单词的长度。但是,文件内的文本未采用正常句子结构的格式。有时单词之间会有额外的空格,句子中间会有换行符。
当前代码
def average(filename):
with open(filename, "r") as f:
for line in f:
words = line.split()
average = sum(len(words) for words in words)/len(words)
return average
>>>4.3076923076923075
Expected
>>>4.352941176470588
文件
Here are some words there is no punctuation but there are words what
is the average length
最佳答案
当您以f
打开文件时,然后运行
for x in f:
x
将是文件中的每一行,以换行符结尾。您得到的答案对于第一行文本来说是完全正确的。如果您希望第二行包含在第一行中,则需要将文本文件作为一个整体进行处理,而不是逐行处理。
假设您想获得文件中所有单词的平均值,下面的方法应该会更好一些:
def average(filename):
with open(filename, "r") as f:
lines = [line for line in f]
words = " ".join(lines).split()
average = sum(len(word) for word in words)/len(words)
return average
关于python - 在Python中读取文件后正确分割字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33058707/