python - 使用 Selenium 抓取 id_str 对象

使用 python 中的 Selenium 库，我目前正在从 Twitter 搜索结果页面抓取内容:https://twitter.com/search?q=twinkie&src=typd&lang=en

Selenium 库有以下函数来识别我们要抓取的内容:

find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

我要抓取的特定对象称为 id_str。它是每个帐户特定的一串唯一编号。我一直很难弄清楚如何捕获这个特定的物体。

由于每个元素的长度，我不会在此处直接复制所有 html 代码，但我注意到所有 id_str 前面都有:

<div class="tweet js-stream-tweet js-actionable-tweet js-profile-popup-actionable dismissible-content
   original-tweet js-original-tweet


   has-cards  has-content

您建议我使用哪个函数来获取 id_str。最理想的是，我希望对网页代码有足够的了解，以便我自己能够识别其他对象——我应该查看哪些主题以更好地理解？我对编码还是比较陌生。

非常感谢大家的阅读

最佳答案

假设您想要获取共享的 div 元素的“data-reply-to-users-json”属性中“id_str”键的值，试试这个:

from selenium import webdriver
import ast

driver = webdriver.Chrome()
driver.get('https://twitter.com/search?q=twinkie&src=typd&lang=en')
tweets = driver.find_elements_by_xpath("//div[contains(@class, 'tweet js-stream-tweet js-actionable-tweet js-profile-popup-actionable dismissible-content')]")
for tweet in tweets:
    print(ast.literal_eval(tweet.get_attribute('data-reply-to-users-json'))[0]['id_str'])

这应该打印所有“id_str”值。

关于python - 使用 Selenium 抓取 id_str 对象，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53604770/

python - 使用 Selenium 抓取 id_str 对象

上一篇：html - 将元素底部对齐，同时在 flexbox 中保持拉伸(stretch)

下一篇：javascript - 使用 Angular4 直接更新到 innerHTML 时如何提高性能