python - 使用 selenium 和 python 在网页网格中抓取 javascript 数据

标签 python selenium selenium-webdriver webdriver webdriverwait

我的问题是,我需要包含网站 https://applipedia.paloaltonetworks.com 的子域的网格中的所有数据。 -(包含名称、类别、子类别、风险、技术的数据)。我需要的是[示例:第 5 行:2ch 有 2 个子域 |_2ch-base 和 2ch-posting。像这样我只想获取所有具有子域的应用程序的列表]

每当我尝试在行中添加任何内容时都不会:

table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,    'tbody#bodyScrollingTable tr')))

我收到超时错误。

下面是我现在拥有的脚本,它从网格中获取所有数据,但我只需要应用程序,并且它包含子域。[示例 2ch、2ch-base、2ch-posting]。我通过检查元素发现了一种模式,即所有没有子域的应用程序都具有 ( ),或者我们可以通过 () 字段,该字段对于所有具有子域的应用程序都很常见。任何有关解决此问题的帮助将不胜感激。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 

driver   = webdriver.Chrome(executable_path = r'/Users/am/Downloads/chromedriver')
driver.maximize_window()

driver.get("https://applipedia.paloaltonetworks.com/") 

wait = WebDriverWait(driver,30)

table =wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,    'tbody#bodyScrollingTable tr')))

for tab in table:
  print(tab.text)

最佳答案

根据网址 https://applipedia.paloaltonetworks.com/ 获取具有子域的所有应用程序的列表,您需要为所需的WebDriverWait>元素可见,您可以使用以下解决方案:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-gpu")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get('https://applipedia.paloaltonetworks.com/')
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
    for element in elements:
        print(element.get_attribute("innerHTML"))
    
  • 控制台输出:

    DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6
    
                                    2ch
    
    
                                    51.com
    
    
                                    adobe-connect
    
    
                                    adobe-connectnow
    
    
                                    adobe-creative-cloud
    
    
                                    aim
    
    
                                    aim-express
    
    
                                    ali-wangwang
    
    
                                    amazon-cloud-drive
    
    
                                    amazon-music
    
    
                                    ameba-now
    
    
                                    assembla
    
    
                                    autodesk360
    
    
                                    avaya-webalive
    
    
                                    bacnet
    
    
                                    baidu-hi
    
    
                                    bebo
    
    
                                    bitbucket
    
    
                                    boxnet
    
    
                                    buddybuddy
    
    
                                    chinaren
    
    
                                    cisco-spark
    
    
                                    cloudapp
    
    
                                    cloudforge
    
    
                                    cloudinary
    
    
                                    concur
    
    
                                    confluence
    
    
                                    convo
    
    
                                    cyph
    
    
                                    daum
    
    
                                    dcinside
    
    
                                    diameter
    
    
                                    dnp3
    
    
                                    dochub
    
    
                                    docstoc
    
    
                                    docusign
    
    
                                    draw.io
    
    
                                    dropbox
    
    
                                    egnyte
    
    
                                    evernote
    
    
                                    facebook
    
    
                                    fetion
    
    
                                    filestack
    
    
                                    flickr
    
    
                                    flixwagon
    
    
                                    fuze-meeting
    
    
                                    gatherplace
    
    
                                    genesys
    
    
                                    git
    
    
                                    github
    
    
                                    gitlab
    
    
                                    glassdoor
    
    
                                    globalmeet
    
    
                                    gmail
    
    
                                    google-calendar
    
    
                                    google-cloud-storage
    
    
                                    google-docs
    
    
                                    google-hangouts
    
    
                                    google-plus
    
    
                                    google-spaces
    
    
                                    google-talk
    
    
                                    google-translate
    
    
                                    google-video
    
    
                                    gotomypc
    
    
                                    gotowebinar
    
    
                                    gtp
    
    
                                    hadoop
    
    
                                    hightail
    
    
                                    hipchat
    
    
                                    hootsuite
    
    
                                    huddle
    
    
                                    hulu
    
    
                                    hyves
    
    
                                    iccp
    
    
                                    icloud
    
    
                                    iec-60870-5-104
    
    
                                    imeet
    
    
                                    imgur
    
    
                                    instagram
    
    
                                    instan-t
    
    
                                    ip-messenger
    
    
                                    ipsec
    
    
                                    irc
    
    
                                    issuu
    
    
                                    itunes
    
    
                                    jira
    
    
                                    join-me
    
    
                                    jumpshare
    
    
                                    kaixin
    
    
                                    kaixin001
    
    
                                    kakaotalk
    
    
                                    laiwang
    
    
                                    landesk
    
    
                                    linkedin
    
    
                                    live-mesh
    
    
                                    lotus-notes
    
    
                                    lotuslive
    
    
                                    lucidpress
    
    
                                    mail.ru
    
    
                                    mail.ru-agent
    
    
                                    maytech
    
    
                                    meebo
    
    
                                    meetup
    
    
                                    mega
    
    
                                    mendeley
    
    
                                    mercurial
    
    
                                    mixi
    
    
                                    modbus
    
    
                                    ms-ds-smb
    
    
                                    ms-lync
    
    
                                    ms-office365
    
    
                                    ms-onedrive
    
    
                                    msn
    
    
                                    myspace
    
    
                                    nateon-im
    
    
                                    netease-webdisk
    
    
                                    netflix
    
    
                                    ning
    
    
                                    noteworthy
    
    
                                    now-tv
    
    
                                    odnoklassniki
    
    
                                    onehub
    
    
                                    owncloud
    
    
                                    paltalk
    
    
                                    pastebin
    
    
                                    pcanywhere
    
    
                                    pinterest
    
    
                                    pivotaltracker
    
    
                                    powow
    
    
                                    prezi
    
    
                                    proofhub
    
    
                                    qik
    
    
                                    qliksense-cloud
    
    
                                    qq
    
    
                                    quip
    
    
                                    quora
    
    
                                    rally-software
    
    
                                    readytalk
    
    
                                    reddit
    
    
                                    rediffbol
    
    
                                    renren
    
    
                                    rtp
    
    
                                    salesforce
    
    
                                    sap-jam
    
    
                                    screencast
    
    
                                    scribd
    
    
                                    second-life
    
    
                                    secure-data-space
    
    
                                    sendthisfile
    
    
                                    service-now
    
    
                                    sharefile
    
    
                                    sharepoint
    
    
                                    sharevault
    
    
                                    showmax
    
    
                                    siemens-s7
    
    
                                    signiant
    
    
                                    sina-uc
    
    
                                    sina-weibo
    
    
                                    skydrive
    
    
                                    slack
    
    
                                    slideshare
    
    
                                    smartsheet
    
    
                                    snmp
    
    
                                    softros-messenger
    
    
                                    solarwinds
    
    
                                    soundcloud
    
    
                                    sourceforge
    
    
                                    spark-im
    
    
                                    ss7-map
    
    
                                    stocktwits
    
    
                                    storify
    
    
                                    subversion
    
    
                                    surveymonkey
    
    
                                    syncplicity
    
    
                                    tableau
    
    
                                    teamdrive
    
    
                                    teamup-calendar
    
    
                                    teamviewer
    
    
                                    thwapr
    
    
                                    torch-browser
    
    
                                    trello
    
    
                                    tumblr
    
    
                                    twitter
    
    
                                    uc-yun
    
    
                                    viber
    
    
                                    vimeo
    
    
                                    vine
    
    
                                    virustotal
    
    
                                    vkontakte
    
    
                                    vnc
    
    
                                    watchdox
    
    
                                    webex
    
    
                                    wechat
    
    
                                    weiyun
    
    
                                    whatsapp
    
    
                                    windows-azure
    
    
                                    windows-defender-atp
    
    
                                    workday
    
    
                                    yahoo-im
    
    
                                    yammer
    
    
                                    youku
    
    
                                    yousendit
    
    
                                    youtube
    
    
                                    yunpan360
    
    
                                    yy-voice
    
    
                                    zalo
    
    
                                    zendesk
    
    
                                    zenefits
    
    
                                    zettahost
    

关于python - 使用 selenium 和 python 在网页网格中抓取 javascript 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52337888/

相关文章:

selenium - 尝试通过网络路径访问 chromedriver 时,CreatePlatformSocket() 返回错误 : An invalid argument was supplied. (0x2726)

selenium - 未捕获的 DOMException : Failed to execute '$' on 'CommandLineAPI' : not a valid selector

java - 在 PATH 中找不到 firefox 二进制文件。确保已安装 Firefox

Selenium - 不支持在常规用户 session 中以 root 身份运行 Firefox

python - Python中的相对路径

Python 等效于 MATLAB 的冒号运算符

python-3.x - python : Click by coordinate inside a window

python - 删除Python中第一个实例后的字符串字符

python - 改变图像的深度?

java - 无法通过 jenkins 上的 chrome webdriver 启动 chrome,因为它在我的本地计算机上运行良好