python textwrap 在错误的地方打破句子

标签 python word-wrap

我发现 python 的 textwrap 库在错误的地方断句。我正在使用:

wrp = textwrap.TextWrapper(width=32,break_long_words=False,replace_whitespace=False)
out = '\n'.join(wrp.wrap(txt))

将此应用于以下段落*:

The Caterpillar and Alice looked at each other for some time in silence:
at last the Caterpillar took the hookah out of its mouth, and addressed
her in a languid, sleepy voice.

'Who are YOU?' said the Caterpillar.

This was not an encouraging opening for a conversation. Alice replied,
rather shyly, 'I--I hardly know, sir, just at present--at least I know
who I WAS when I got up this morning, but I think I must have been
changed several times since then.'

换行的结果是:

The Caterpillar and Alice looked
at each other for some time in
silence:
at last the
Caterpillar took the hookah out
of its mouth, and addressed
her
in a languid, sleepy voice.
'Who are YOU?' said the
Caterpillar.

This was not an
encouraging opening for a
conversation. Alice replied,
rather shyly, 'I--I hardly know,
sir, just at present--at least I
know
who I WAS when I got up
this morning, but I think I must
have been
changed several times
since then.

一些额外的中断是因为原始文本已经换行。但仍然添加了不正确的中断,例如最后 | Caterpillar,最后一句完全是一团糟。谁能建议如何正确包装它?

  • 文章源自curl https://www.gutenberg.org/cache/epub/11/pg11.txt | sed -n 960,969p> alice.txt

最佳答案

保留文本格式:我们替换后面或前面有字母的任何回车。确保保留文本格式:

re.sub("([,\w])\n(\w)", "\1 \2", sys.stdin.read())

The Caterpillar and Alice looked at each other for some time in silence:
at last the Caterpillar took the hookah out of its mouth, and addressed her in a languid, sleepy voice.

'Who are YOU?' said the Caterpillar.

This was not an encouraging opening for a conversation. Alice replied, rather shyly, 'I--I hardly know, sir, just at present--at least I know who I WAS when I got up this morning, but I think I must have been changed several times since then.'

然后您可以包装每个部分:

text = re.sub("([,\w])\n(\w)", "\1 \2", sys.stdin.read())
for part in text.splitlines():
    print '\n'.join(textwrap.wrap(part, width=32))

The Caterpillar and Alice looked
at each other for some time in
silence:
at last the Caterpillar took the
hookah out of its mouth, and
addressed her in a languid,
sleepy voice.

'Who are YOU?' said the
Caterpillar.

This was not an encouraging
opening for a conversation.
Alice replied, rather shyly, 'I
--I hardly know, sir, just at
present--at least I know who I
WAS when I got up this morning,
but I think I must have been
changed several times since
then.'

关于python textwrap 在错误的地方打破句子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34370076/

相关文章:

python - 有没有什么好的基于django的项目管理工具?

java - 如何在 Eclipse for Java 中对字符串和其他项目使用不同的换行

html - 垂直排列环绕的文本

python - 重新排列 Pandas 数据框

python - 获取file1的相对路径(相对于file2的路径,file1在file2的子文件夹中)

python - 在 PyMC3 中运行多变量有序 logit

html - IMG 与自动换行问题对齐?

python - 禁用文件的所有 Pylint 警告

css - word-wrap:break-word 在 IE8 中不工作

text - 如何在 QML 的矩形中包装一些文本?