我有几个个人资料的链接,我想获取他们的关注者的用户名。我无法使用 API,因为它非常慢,而且这里我需要数千名关注者,所以我使用 selenium。
driver = webdriver.Chrome()
driver.get("https://twitter.com/login")
time.sleep(2)
login_id = driver.find_elements_by_class_name("r-30o5oe.r-1niwhzg.r-17gur6a.r-1yadl64.r-deolkf.r-homxoj.r-poiln3.r-7cikom.r-1ny4l3l.r-1inuy60.r-utggzx.r-vmopo1.r-1w50u8q.r-1lrr6ok.r-1dz5y72.r-fdjqy7.r-13qz1uu")[0]
login_id.send_keys("Username Here")
password = driver.find_elements_by_class_name("r-30o5oe.r-1niwhzg.r-17gur6a.r-1yadl64.r-deolkf.r-homxoj.r-poiln3.r-7cikom.r-1ny4l3l.r-1inuy60.r-utggzx.r-vmopo1.r-1w50u8q.r-1lrr6ok.r-1dz5y72.r-fdjqy7.r-13qz1uu")[1]
password.send_keys("Password Here")
driver.find_element_by_class_name("css-901oao.r-1awozwy.r-jwli3a.r-6koalj.r-18u37iz.r-16y2uox.r-1qd0xha.r-a023e6.r-vw2c0b.r-1777fci.r-eljoum.r-dnmrzs.r-bcqeeo.r-q4m81j.r-qvutc0").click()
driver.get("Profile Link")
time.sleep(2)
# Code to goto End of the Page
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(10)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
#get usernames element
usernames = driver.find_elements_by_class_name(
"css-18t94o4.css-1dbjc4n.r-1ny4l3l.r-1j3t67a.r-1w50u8q.r-o7ynqc.r-6416eg")
print(len(usernames))
for username in usernames:
print(username.find_element_by_class_name("css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l").get_attribute("href"))
我使用上面的代码转到页面底部,然后提取用户名字段。
问题是我只能获取前 20 名或 30 名关注者的用户名。 有人可以帮我吗?
最佳答案
我稍微修改了你的代码,你可以尝试一下。也许您需要再次调整 sleep 定时器:
follower_list = []
# Code to goto End of the Page
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(1)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
#get usernames element
usernames = driver.find_elements_by_class_name(
"css-18t94o4.css-1dbjc4n.r-1ny4l3l.r-1j3t67a.r-1w50u8q.r-o7ynqc.r-6416eg")
print(len(usernames))
for username in usernames:
username = username.find_element_by_class_name("css-4rbku5.css-18t94o4.css-1dbjc4n.r-1loqt21.r-1wbh5a2.r-dnmrzs.r-1ny4l3l").get_attribute("href")
if username not in follower_list:
follower_list.append(username)
print(len(follower_list))
print(follower_list)
关于selenium - 使用 Selenium 抓取 Twitter 粉丝,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63650105/