Python 网页抓取 : I have a website with pick list. 以及如何提取这些列表中的文本

链接如下: https://www.doximity.com/sign_ups/9e016f85-d589-4cdf-8240-09c356d4434f/edit?sign_up[user_attributes][firstname]=Jian&sign_up[user_attributes][lastname]=Cui

我需要拉出职业及其对应的专长。但我的代码只适用于拉动职业。

import requests, bs4

r = requests.get('https://www.doximity.com/sign_ups/9e016f85-d589-4cdf-8240-09c356d4434f/edit?sign_up[user_attributes][firstname]=Jian&sign_up[user_attributes][lastname]=Cui')
soup = bs4.BeautifulSoup(r.text, 'lxml')
spec = soup.find_all('select')

for sub in spec:
    print (sub.text)

请给我一些想法。

最佳答案

检查下面的代码，如有任何问题请告诉我:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time

driver = webdriver.Chrome()
url = 'https://www.doximity.com/sign_ups/9e016f85-d589-4cdf-8240-09c356d4434f/edit?sign_up[user_attributes][firstname]=Jian&sign_up[user_attributes][lastname]=Cui'

driver.get(url)
spec = driver.find_element_by_id("sign_up_user_attributes_credential_id")
for sub in spec.find_elements_by_xpath('./option | ./optgroup/option'):
    if sub.get_attribute('value') != '':
        print(sub.text)
    selected_spec = Select(driver.find_element_by_id("sign_up_user_attributes_credential_id"))
    selected_spec.select_by_visible_text(sub.text)
    time.sleep(0.5)
    occup = driver.find_element_by_xpath('//select[@id="sign_up_user_attributes_user_professional_detail_attributes_specialty_id"]')
    for oc in occup.find_elements_by_xpath('./option'):
        if oc.text != '' and oc.get_attribute('value') != '':
            print(oc.text)

关于Python 网页抓取 : I have a website with pick list. 以及如何提取这些列表中的文本，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42170243/

Python 网页抓取 : I have a website with pick list. 以及如何提取这些列表中的文本

上一篇：python - Pandas 数据框分组并加入列

下一篇：python - Seaborn 联合图，绝对轴标签未偏移