python - 如何在python中抓取网页上id = "firstheading"之后的所有信息?

标签 python python-3.x beautifulsoup

我正在尝试从第一个标题之后的网页(使用Python)中抓取所有文本。该标题的标签是:<h1 id="firstHeading" class="firstHeading" lang="en">Albert Einstein</h1>

我不需要此标题之前的任何信息。我想抓取此标题后写的所有文字。我可以在 python 中使用 BeautifulSoup 吗?

我正在运行以下代码: ` *

import requests
import bs4
from bs4 import BeautifulSoup

urlpage = 'https://en.wikipedia.org/wiki/Albert_Einstein#Publications'
res = requests.get(urlpage)
soup1 = (bs4.BeautifulSoup(res.text, 'lxml')).get_text()
 print(soup1)

`*

该网页包含以下信息:

Albert Einstein - Wikipedia
document.documentElement.className="client-js";RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Albert_Einstein","wgTitle":"Albert Einstein","wgCurRevisionId":920687884,"wgRevisionId":920687884,"wgArticleId":736,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages with missing ISBNs","Webarchive template wayback links","CS1 German-language sources (de)","CS1: Julian–Gregorian uncertainty","CS1 French-language sources (fr)","CS1 errors: missing periodical","CS1: long volume value","Wikipedia indefinitely semi-protected pages","Use American English from February 2019","All Wikipedia articles written in American English","Articles with short description","Good articles","Articles containing German-language text","Biography with signature","Articles with hCards","Articles with hAudio microformats","All articles with unsourced statements",
"Articles with unsourced statements from July 2019","Commons category link from Wikidata","Articles with Wikilivres links","Articles with Curlie links","Articles with Project Gutenberg links","Articles with Internet Archive links","Articles with LibriVox links","Use dmy dates from August 2019","Wikipedia articles with BIBSYS identifiers","Wikipedia articles with BNE identifiers","Wikipedia articles with BNF identifiers","Wikipedia articles with GND identifiers","Wikipedia articles with HDS identifiers","Wikipedia articles with ISNI identifiers","Wikipedia articles with LCCN identifiers","Wikipedia articles with LNB identifiers","Wikipedia articles with MGP identifiers","Wikipedia articles with NARA identifiers","Wikipedia articles with NCL identifiers","Wikipedia articles with NDL identifiers","Wikipedia articles with NKC identifiers","Wikipedia articles with NLA identifiers","Wikipedia articles with NLA-person identifiers","Wikipedia articles with NLI identifiers",
"Wikipedia articles with NLR identifiers","Wikipedia articles with NSK identifiers","Wikipedia articles with NTA identifiers","Wikipedia articles with SBN identifiers","Wikipedia articles with SELIBR identifiers","Wikipedia articles with SNAC-ID identifiers","Wikipedia articles with SUDOC identifiers","Wikipedia articles with ULAN identifiers","Wikipedia articles with VIAF identifiers","Wikipedia articles with WorldCat-VIAF identifiers","AC with 25 elements","Wikipedia articles with suppressed authority control identifiers","Pages using authority control with parameters","Articles containing timelines","Pantheists","Spinozists","Albert Einstein","1879 births","1955 deaths","20th-century American engineers","20th-century American writers","20th-century German writers","20th-century physicists","American agnostics","American inventors","American letter writers","American pacifists","American people of German-Jewish descent","American physicists","American science writers",
"American socialists","American Zionists","Ashkenazi Jews","Charles University in Prague faculty","Corresponding Members of the Russian Academy of Sciences (1917–25)","Cosmologists","Deaths from abdominal aortic aneurysm","Einstein family","ETH Zurich alumni","ETH Zurich faculty","German agnostics","German Jews","German emigrants to Switzerland","German Nobel laureates","German inventors","German physicists","German socialists","European democratic socialists","Institute for Advanced Study faculty","Jewish agnostics","Jewish American scientists","Jewish emigrants from Nazi Germany to the United States","Jews who emigrated to escape Nazism","Jewish engineers","Jewish inventors","Jewish philosophers","Jewish physicists","Jewish socialists","Leiden University faculty","Foreign Fellows of the Indian National Science Academy","Foreign Members of the Royal Society","Members of the American Philosophical Society","Members of the Bavarian Academy of Sciences","Members of the Lincean Academy"
,"Members of the Royal Netherlands Academy of Arts and Sciences","Members of the United States National Academy of Sciences","Honorary Members of the USSR Academy of Sciences","Naturalised citizens of Austria","Naturalised citizens of Switzerland","New Jersey socialists","Nobel laureates in Physics","Patent examiners","People from Berlin","People from Bern","People from Munich","People from Princeton, New Jersey","People from Ulm","People from Zürich","People who lost German citizenship","People with acquired American citizenship","Philosophers of science","Relativity theorists","Stateless people","Swiss agnostics","Swiss emigrants to the United States","Swiss Jews","Swiss physicists","Theoretical physicists","Winners of the Max Planck Medal","World federalists","Recipients of the Pour le Mérite (civil class)","Determinists","Activists from New Jersey","Mathematicians involved with Mathematische Annalen","Intellectual Cooperation","Disease-related deaths in New Jersey"],
"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRelevantPageName":"Albert_Einstein","wgRelevantArticleId":736,"wgRequestId":"XaChjApAICIAALSsYfgAAABV","wgCSPNonce":!1,"wgIsProbablyEditable":!1,"wgRelevantPageIsProbablyEditable":!1,"wgRestrictionEdit":["autoconfirmed"],"wgRestrictionMove":["sysop"],"wgMediaViewerOnClick":!0,"wgMediaViewerEnabledByDefault":!0,"wgPopupsReferencePreviews":!1,"wgPopupsConflictsWithNavPopupGadget":!1,"wgVisualEditor":{"pageLanguageCode":"en","pageLanguageDir":"ltr","pageVariantFallbacks":"en"},"wgMFDisplayWikibaseDescriptions":{"search":!0,"nearby":!0,"watchlist":!0,"tagline":
!1},"wgWMESchemaEditAttemptStepOversample":!1,"wgULSCurrentAutonym":"English","wgNoticeProject":"wikipedia","wgWikibaseItemId":"Q937","wgCentralAuthMobileDomain":!1,"wgEditSubmitButtonLabelPublish":!0};RLSTATE={"ext.globalCssJs.user.styles":"ready","site.styles":"ready","noscript":"ready","user.styles":"ready","ext.globalCssJs.user":"ready","user":"ready","user.options":"ready","user.tokens":"loading","ext.cite.styles":"ready","ext.math.styles":"ready","mediawiki.legacy.shared":"ready","mediawiki.legacy.commonPrint":"ready","jquery.makeCollapsible.styles":"ready","mediawiki.toc.styles":"ready","wikibase.client.init":"ready","ext.visualEditor.desktopArticleTarget.noscript":"ready","ext.uls.interlanguage":"ready","ext.wikimediaBadges":"ready","ext.3d.styles":"ready","mediawiki.skinning.interface":"ready","skins.vector.styles":"ready"};RLPAGEMODULES=["ext.cite.ux-enhancements","ext.cite.tracking","ext.math.scripts","ext.scribunto.logs","site","mediawiki.page.startup",
"mediawiki.page.ready","jquery.makeCollapsible","mediawiki.toc","mediawiki.searchSuggest","ext.gadget.teahouse","ext.gadget.ReferenceTooltips","ext.gadget.watchlist-notice","ext.gadget.DRN-wizard","ext.gadget.charinsert","ext.gadget.refToolbar","ext.gadget.extra-toolbar-buttons","ext.gadget.switcher","ext.centralauth.centralautologin","mmv.head","mmv.bootstrap.autostart","ext.popups","ext.visualEditor.desktopArticleTarget.init","ext.visualEditor.targetLoader","ext.eventLogging","ext.wikimediaEvents","ext.navigationTiming","ext.uls.compactlinks","ext.uls.interface","ext.cx.eventlogging.campaigns","ext.quicksurveys.init","ext.centralNotice.geoIP","ext.centralNotice.startUp","skins.vector.js"];
(RLQ=window.RLQ||[]).push(function(){mw.loader.implement("user.tokens@tffin",function($,jQuery,require,module){/*@nomin*/mw.user.tokens.set({"patrolToken":"+\\","watchToken":"+\\","csrfToken":"+\\"});
});});

Albert Einstein

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search "Einstein" redirects here. For other people, see Einstein (surname). For other uses, see Albert Einstein (disambiguation) and Einstein (disambiguation).

German-born physicist and developer of the theory of relativity

Albert EinsteinEinstein in 1921Born(1879-03-14)14 March 1879Ulm, Kingdom of Württemberg, German EmpireDied18 April 1955(1955-04-18) (aged 76)Princeton, New Jersey, United StatesResidenceGermany, Italy, Switzerland, Austria (present-day Czech Republic), Belgium, United StatesCitizenship Subject of the Kingdom of Württemberg during the German Empire (1879–1896)[note 1] Stateless (1896–1901) Citizen of Switzerland (1901–1955) Austrian subject of the Austro-Hungarian Empire (1911–1912) Subject of the Kingdom of Prussia during the German Empire (1914–1918)[note 1] German citizen of the Free State of Prussia (Weimar Republic, 1918–1933) Citizen of the United States (1940–1955) Education Federal polytechnic school (1896–1900; B.A., 1900) University of Zurich (Ph.D., 1905) Known for General relativity Special relativity Photoelectric effect E=mc2 (Mass–energy equivalence) E=hf (Planck–Einstein relation) Theory of Brownian motion Einstein field equations Bose–Einstein statistics Bose–Einstein condensate Gravitational wave Cosmological constant Unified field theory EPR paradox Ensemble interpretation List of other concepts Spouse(s)Mileva Marić(m. 1903; div. 1919)Elsa Löwenthal(m. 1919; died[1][2] 1936)Children"Lieserl" Einstein Hans Albert Einstein Eduard "Tete" EinsteinAwards Barnard Medal (1920) Nobel Prize in Physics (1921) Matteucci Medal (1921) ForMemRS (1921)[3] Copley Medal (1925)[3] Gold Medal of the Royal Astronomical Society (1926) Max Planck Medal (1929) Member of the National Academy of Sciences (1942) Time Person of the Century (1999) Scientific careerFieldsPhysics, philosophyInstitutions Swiss Patent Office (Bern) (1902–1909) University of Bern (1908–1909) University of Zurich (1909–1911) Charles University in Prague (1911–1912) ETH Zurich (1912–1914) Prussian Academy of Sciences (1914–1933) Humboldt University of Berlin (1914–1933) Kaiser Wilhelm Institute (director, 1917–1933) German Physical Society (president, 1916–1918) Leiden University (visits, 1920) Institute for Advanced Study (1933–1955) Caltech (visits, 1931–1933) University of Oxford (visits, 1931–1933) ThesisEine neue Bestimmung der Moleküldimensionen (A New Determination of Molecular Dimensions) (1905)Doctoral advisorAlfred KleinerOther academic advisorsHeinrich Friedrich WeberInfluences Arthur Schopenhauer Baruch Spinoza Bernhard Riemann David Hume Ernst Mach Hendrik Lorentz Hermann Minkowski Isaac Newton James Clerk Maxwell Michele Besso Moritz Schlick Thomas Young Influenced Virtually all modern physics

Signature Albert Einstein (/ˈaɪnstaɪn/ EYEN-styne;[4] German: [ˈalbɛʁt ˈʔaɪnʃtaɪn] (listen); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist[5] who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics).[3][6]:274 His work is also known for its influence on the philosophy of science.[7][8] He is best known to the general public for his mass–energy equivalence formula . . . . .

我只想要第一个标题“Albert Einstein”之后的文本

最佳答案

首先找到 h1 标签,然后使用 find_next_siblings('div') 并打印文本值。

import requests
import bs4

urlpage = 'https://en.wikipedia.org/wiki/Albert_Einstein#Publications'
res = requests.get(urlpage)
soup1 =bs4.BeautifulSoup(res.text, 'lxml')
h1=soup1.find('h1')
for item in h1.find_next_siblings('div'):
    print(item.text)

关于python - 如何在python中抓取网页上id = "firstheading"之后的所有信息?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58347139/

相关文章:

python - 如何在Python中将HTML解析为字符串模板?

python - 如何将文本放在python图之外?

python - 新鲜的 cookiecutter django 项目在 environ.py 中显示 "Invalid syntax"

python - 使用 Python 中的日期列表循环 24 小时周期

python - BeautifulSoup - 组合连续的标签

asp.net - 抓取 asp.net 页面时出现 EVENTVALIDATION 错误

Python 值错误 : cannot copy sequence to array

python - 清理模板代码

python - pynotify 不适用于守护进程

Python3数据帧重组