selenium - 使用 Selenium 抓取 Twitter 粉丝

标签 selenium twitter selenium-chromedriver

我有几个个人资料的链接,我想获取他们的关注者的用户名。我无法使用 API,因为它非常慢,而且这里我需要数千名关注者,所以我使用 selenium。

driver = webdriver.Chrome()
driver.get("https://twitter.com/login")
time.sleep(2)

login_id = driver.find_elements_by_class_name("r-30o5oe.r-1niwhzg.r-17gur6a.r-1yadl64.r-deolkf.r-homxoj.r-poiln3.r-7cikom.r-1ny4l3l.r-1inuy60.r-utggzx.r-vmopo1.r-1w50u8q.r-1lrr6ok.r-1dz5y72.r-fdjqy7.r-13qz1uu")[0]
login_id.send_keys("Username Here")


password = driver.find_elements_by_class_name("r-30o5oe.r-1niwhzg.r-17gur6a.r-1yadl64.r-deolkf.r-homxoj.r-poiln3.r-7cikom.r-1ny4l3l.r-1inuy60.r-utggzx.r-vmopo1.r-1w50u8q.r-1lrr6ok.r-1dz5y72.r-fdjqy7.r-13qz1uu")[1]
password.send_keys("Password Here")

driver.find_element_by_class_name("css-901oao.r-1awozwy.r-jwli3a.r-6koalj.r-18u37iz.r-16y2uox.r-1qd0xha.r-a023e6.r-vw2c0b.r-1777fci.r-eljoum.r-dnmrzs.r-bcqeeo.r-q4m81j.r-qvutc0").click()

driver.get("Profile Link")

time.sleep(2)


# Code to goto End of the Page
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait to load page
    time.sleep(10)
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

#get usernames element
usernames = driver.find_elements_by_class_name(
        "css-18t94o4.css-1dbjc4n.r-1ny4l3l.r-1j3t67a.r-1w50u8q.r-o7ynqc.r-6416eg")
print(len(usernames))
for username in usernames:
    print(username.find_element_by_class_name("css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l").get_attribute("href"))

我使用上面的代码转到页面底部,然后提取用户名字段。

enter image description here

问题是我只能获取前 20 名或 30 名关注者的用户名。 有人可以帮我吗?

最佳答案

我稍微修改了你的代码,你可以尝试一下。也许您需要再次调整 sleep 定时器:

follower_list = []
# Code to goto End of the Page
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait to load page
    time.sleep(1)
    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

    #get usernames element
    usernames = driver.find_elements_by_class_name(
            "css-18t94o4.css-1dbjc4n.r-1ny4l3l.r-1j3t67a.r-1w50u8q.r-o7ynqc.r-6416eg")
    print(len(usernames))
    for username in usernames:
        username = username.find_element_by_class_name("css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l").get_attribute("href")
        if username not in follower_list:
            follower_list.append(username)

print(len(follower_list))
print(follower_list)

关于selenium - 使用 Selenium 抓取 Twitter 粉丝,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63650105/

相关文章:

ios - 登录时从 Twitter 获取数据

python - Selenium:WebDriverException:Chrome 无法启动:由于 google-chrome 不再运行而崩溃,因此 ChromeDriver 假设 Chrome 已崩溃

java - 使用 Java 中的 Eclipse 进行 Droid 编程

java - 如何选择 Selenium 中的下拉值?

javascript - 尝试查找元素返回 "Proxy?"

google-chrome - 升级到 chrome 57 所需的 chromedriver 2.28 后 selenium 出现奇怪的错误

python Selenium "Chrome is being controlled by automated test software"

java - 对于页面对象类中的方法,使用 public void 还是 public [页面名称] 更好?

java - 速率限制预防 [Twitter4J]

c# - Unity Share 脚本 Android 到 iOS