selenium - 如何使用 Python 填写 JavaScript 表单?

标签 selenium web-scraping beautifulsoup scrapy mechanize

我想用Python来填充this形式。
我尝试使用 Mechanize,但这是一个使用 JavaScript 的 Microsoft 表单,没有表单标签,也没有 GET/POST URL。也许 BeautifulSoup/Selenium 可以做到这一点,但我没有任何抓取 JS 表单的经验。任何人都可以帮助我并建议如何解决这个问题吗?
这是我尝试过的,Mechanize 无法识别页面上的任何表单:

import mechanize

def main():
    br = mechanize.Browser()
    br.set_handle_robots(False)
    br.set_handle_refresh(False)
    br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
    response  = br.open("https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u")
    for form in br.forms():
        print("Form name:", form.name) #prints nothing
        print(form) #prints nothing

if __name__ == '__main__':
    main()

最佳答案

Selenium 工作正常。
你需要install the components

  • 安装 Selenium pip install selenium
  • 您需要确保为您的浏览器和操作系统版本下载正确的 chromedriver(或其他驱动程序)和 将其添加到路径

  • 然后运行:
    from selenium import webdriver
    
    driver = webdriver.Chrome()
    url = "https://forms.office.com/Pages/ResponsePage.aspx?id=8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u"
    driver.get(url)
    
    
    name = driver.find_element_by_xpath("//div[@class='question-title-box'][.//span[text()='NAME']]/following-sibling::*//input")
    name.send_keys("hello, World")
    
    setionSelection = "F"
    section = driver.find_element_by_xpath("//div[@class='question-title-box'][.//span[text()='Section']]/following-sibling::*//input[@value='" + setionSelection + "']")
    section.click()
    
    date = driver.find_element_by_xpath("//input[contains(@placeholder, 'Please input date')]")
    date.send_keys("01/12/2020")
    
    
    submit = driver.find_element_by_xpath("//div[text()='Submit']")
    submit.click()
    
    xapths 有点长,但它们基于问题文本,因此可能稳定
    Working selenium

    对于另一种方法 - 当您说没有 POST url 时,您是否检查过 devtools? - 暴露了表单的目的地:
    Request URL: https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses
    Request Method: POST
    
    它还公开了有效载荷......这是第一次提交:
    {startDate: "2020-08-17T10:40:18.504Z", submitDate: "2020-08-17T10:40:18.507Z",…}
    answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"Hello, World"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-28"}]"
    startDate: "2020-08-17T10:40:18.504Z"
    submitDate: "2020-08-17T10:40:18.507Z"
    
    那些发布 URL UUID/GUIDs 问题的 ID 似乎对这个表单很重要。每次我运行时,他们都不会改变。这是第二次运行:
    {startDate: "2020-08-17T10:43:48.544Z", submitDate: "2020-08-17T10:43:48.546Z",…}
    answers: "[{"questionId":"r8f09d63e6f6f42feb2f8f4f8ed3f9389","answer1":"test me"},{"questionId":"r28fe12073dfa47399f8ce95ae679dccf","answer1":"G"},{"questionId":"r8f9e9fedcc2e410c80bfa1e0e3ef9750","answer1":"2020-08-12"}]"
    startDate: "2020-08-17T10:43:48.544Z"
    submitDate: "2020-08-17T10:43:48.546Z"
    
    一旦您捕获了一次,您可能就可以在没有 GUI 的情况下通过 API 来完成它。
    ......只是为了确保,我试过了,我成功了......
    enter image description here
    import requests
    
    url = "https://forms.office.com/formapi/api/aebbf9f0-23da-49e3-98bf-32171abbc9bc/users/f70e502c-96b2-4239-aca3-764dea371071/forms('8Pm7rtoj40mYvzIXGrvJvCxQDveyljlCrKN2Teo3EHFUQVNaWDlYRkhYR09JRTZWRFpKTTNIQU9HUC4u')/responses"
    myobj = {"startDate":"2020-08-17T10:48:40.118Z","submitDate":"2020-08-17T10:48:40.121Z","answers":"[{\"questionId\":\"r8f09d63e6f6f42feb2f8f4f8ed3f9389\",\"answer1\":\"Hello again, World\"},{\"questionId\":\"r28fe12073dfa47399f8ce95ae679dccf\",\"answer1\":\"F\"},{\"questionId\":\"r8f9e9fedcc2e410c80bfa1e0e3ef9750\",\"answer1\":\"2020-08-26\"}]"}
    
    x = requests.post(url, data = myobj)
    
    我的答案只是硬编码到数据对象中,但它似乎有效。
    记得点赞install requests如果你还没有它

    关于selenium - 如何使用 Python 填写 JavaScript 表单?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63444954/

    相关文章:

    html - 为什么我的 XPath 忽略谓词并选择多个节点?

    javascript - 如何获取selenium中点击事件的按钮路径

    java - XPath Java Selenium 尝试根据另一列中的文本单击一个列中的值

    python-3.x - 从 Metacritic 抓取游戏数据的问题

    python - 在Python中将SRC属性与汤返回隔离

    java - 无法从 TestNG Class ->java.lang.NullPointerException 调用两个单独类的属性

    javascript - 使用 PhantomJS 检索完全填充的动态内容

    google-sheets - 如果股票行情中有一个点,IMPORTXML 无法从雅虎财经获取

    python - bs4 丢弃特定标签之前的所有 HTML

    python - 使用 Beautifulsoup 遍历元素