我很好奇是否有用于 python 或 javascript 的库来标记一串句子的句子并在每个句子处换行?
即:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum aliquet leo in urna hendrerit placerat. Donec adipiscing dignissim adipiscing. Duis adipiscing mollis cursus. Etiam fringilla elit nec enim sagittis a auctor nisi gravida. Nunc sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat. Suspendisse a consequat turpis. Morbi eget ante leo, a dignissim mi.
到
Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n
Vestibulum aliquet leo in urna hendrerit placerat.\n
Donec adipiscing dignissim adipiscing. \n
Duis adipiscing mollis cursus. Etiam fringilla elit nec enim sagittis a auctor nisi gravida. Nunc sollicitudin, leo sit amet consequat pharetra, mi orci vestibulum mi, a suscipit odio tellus tincidunt erat. \n
Suspendisse a consequat turpis. \n
Morbi eget ante leo, a dignissim mi.
最佳答案
您正在寻找一个自然语言库。
对于 Python,有 Natural Language Toolkit (NLTK)。例如,您可以查看 PunktSentenceTokenizer
.
The PunktSentenceTokenizer divides a text into a list of sentences, by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences. It must be trained on a large collection of plaintext in the taret language before it can be used. The algorithm for this tokenizer is described in Kiss & Strunk (2006):
Kiss, Tibor and Strunk, Jan (2006): Unsupervised Multilingual Sentence Boundary Detection. Computational Linguistics 32: 485-525.
The NLTK data package includes a pre-trained Punkt tokenizer for English.
关于Javascript 或 Python : Newline after each sentence,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7895312/