这是我的代码,但它打印整个段落。如何只打印第一个句子,直到第一个点?
from bs4 import BeautifulSoup
import urllib.request,time
article = 'https://www.theguardian.com/science/2012/\
oct/03/philosophy-artificial-intelligence'
req = urllib.request.Request(article, headers={'User-agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()
soup = BeautifulSoup(html,'lxml')
def print_intro():
if len(soup.find_all('p')[0].get_text()) > 100:
print(soup.find_all('p')[0].get_text())
此代码打印:
To state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial. The brain is the only kind of object capable of understanding that the cosmos is even there, or why there are infinitely many prime numbers, or that apples fall because of the curvature of space-time, or that obeying its own inborn instincts can be morally wrong, or that it itself exists. Nor are its unique abilities confined to such cerebral matters. The cold, physical fact is that it is the only kind of object that can propel itself into space and back without harm, or predict and prevent a meteor strike on itself, or cool objects to a billionth of a degree above absolute zero, or detect others of its kind across galactic distances.
但我只想打印:
To state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial.
感谢帮助
最佳答案
在该点上分割文本;对于单个分割,使用 str.partition()
比 str.split()
快有限制:
text = soup.find_all('p')[0].get_text()
if len(text) > 100:
text = text.partition('.')[0] + '.'
print(text)
如果您只需要处理第一个 <p>
元素,使用soup.find()
相反:
text = soup.find('p').get_text()
if len(text) > 100:
text = text.partition('.')[0] + '.'
print(text)
但是,对于您给定的 URL,示例文本位于第二段落中:
>>> soup.find_all('p')[1]
<p><span class="drop-cap"><span class="drop-cap__inner">T</span></span>o state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial. The brain is the only kind of object capable of understanding that the cosmos is even there, or why there are infinitely many prime numbers, or that apples fall because of the curvature of space-time, or that obeying its own inborn instincts can be morally wrong, or that it itself exists. Nor are its unique abilities confined to such cerebral matters. The cold, physical fact is that it is the only kind of object that can propel itself into space and back without harm, or predict and prevent a meteor strike on itself, or cool objects to a billionth of a degree above absolute zero, or detect others of its kind across galactic distances.</p>
>>> text = soup.find_all('p')[1].get_text()
>>> text.partition('.')[0] + '.'
'To state that the human brain has capabilities that are, in some respects, far superior to those of all other known objects in the cosmos would be uncontroversial.'
关于Python:打印/获取每个段落的第一句,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35292754/