我使用此代码来解析文本文件并以将每个句子放在新行中的方式对其进行格式化:
import re
# open the file to be formatted
filename=open('inputfile.txt','r')
f=filename.read()
filename.close()
# put every sentence in a new line
pat = ('(?<!Dr)(?<!Esq)\. +(?=[A-Z])')
lines = re.sub(pat,'.\n',f)
print lines
# write the formatted text
# into a new txt file
filename = open("outputfile.txt", "w")
filename.write(lines)
filename.close()
但本质上我需要在 110 个字符之后分割句子。因此,如果一行中的句子超过 110,它会将其拆分并在末尾添加 '...',然后用 '...' 开始一个新行,并跟随拆分句子的其他部分,等等。
有什么建议吗?我不知何故迷路了。
最佳答案
# open inputfile/read/auto-close
with open('inputfile.txt') as f:
lines = f.readlines() # with block auto closes file after block is executed
output = []
for line in lines:
if len(line) > 110:
while True: # until break
output.append(line[:107] + '...')
if len(line[107:]) < 111: # if remainder of line is under 110 chars
output.append('...' + line[107:])
break
line = line[107:] # otherwise loop continues with new line definition
else:
output.append(line)
# open outputfile/write/auto-closed
with open('outputfile.txt', 'w') as f:
for line in output:
f.write(line)
关于Python将文本分割成x个字符的 block ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23165111/