如何使用 Python 从 Wikipedia 文章中提取第一段?
例如,对于 阿尔伯特·爱因斯坦,那就是:
Albert Einstein (pronounced /ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a theoretical physicist, philosopher and author who is widely regarded as one of the most influential and iconic scientists and intellectuals of all time. A German-Swiss Nobel laureate, Einstein is often regarded as the father of modern physics.[2] He received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".[3]
最佳答案
我编写了一个 Python 库,旨在使这变得非常容易。查看 Github .
要安装它,运行
$ pip install wikipedia
然后要获取文章的第一段,只需使用 wikipedia.summary
函数即可。
>>> import wikipedia
>>> print wikipedia.summary("Albert Einstein", sentences=2)
打印
Albert Einstein (/ˈælbərt ˈaɪnstaɪn/; German: [ˈalbɐt ˈaɪnʃtaɪn] ( listen); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist who developed the general theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). While best known for his mass–energy equivalence formula E = mc2 (which has been dubbed "the world's most famous equation"), he received the 1921 Nobel Prize in Physics "for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect".
就其工作原理而言,wikipedia
向 Mobile Frontend Extension 发出请求MediaWiki API,它返回维基百科文章的移动友好版本。具体来说,通过传递参数 prop=extracts&exsectionformat=plain
,MediaWiki 服务器将解析 Wikitext 并返回您请求的文章的纯文本摘要,包括整个页面文本。它还接受参数 exchars
和 exsentences
,这不足为奇地限制了 API 返回的字符和句子的数量。
关于python - 从维基百科文章中提取第一段(Python),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4460921/