html - Python Mechanize HTML 代码不同于 Firebug HTML 代码

我正在使用“Mechanize ”提取一些 HTML 代码。但是，我在输出 HTML 代码时遇到了问题。从本质上讲，似乎 Mechanize 正在将某些元素中的内容替换为“(n/a)”。

示例(Firebug 中显示的结构)

<tr>
    <td>
        <img class="bullet" src="images/bulletorange.gif" alt="">
        <span class="detailCaption">Video Format Mode:</span>
        <span class="settingValue" id="vidSdSdiAnlgFormatSelectionMode.1.1">Auto</span>
    </td>
</tr>

示例( Mechanize 的结构输出)

<tr>
    <td>
        <img class='bullet' src='images/bulletorange.gif' alt='' />
        <span class='detailCaption'>Video Format Mode:</span>
        <span class='settingValue' id="vidSdSdiAnlgFormatSelectionMode.1.1">(n/a)</span>
    </td>
</tr>

问题是“Auto”被“(n/a)”取代。我真的不知道为什么!

请帮忙。机械师为什么要这样做？我该如何解决？

在我的代码下面...

def login_and_return_html(self, url_login, url_after_login, form_username, form_password, username, password):
    """
    Description: Returns html code form a website that requires login.

    Input Arguments: url_login (str)-The url where you enter the login username and password
                     url_after_login (str)-The url where you want to go after you login
                     form_username (str)-The name of the form for the username input field
                     form_password (str)-The name of the form for the password input field
                     username (str)-The actual username
                     password (str)- The actual password

    Return or Output: Returns HTML code of the 'url_after_login' page

    Modules and Classes: mechanize
                         ssl
    """
    try:  # Unabling SSL certificate validation
        _create_unverified_https_context = ssl._create_unverified_context
    except AttributeError:  # Legacy Python that doesn't verify HTTPS certificates by default
        pass
    else:  # Handle target environment that doesn't support HTTPS verification
        ssl._create_default_https_context = _create_unverified_https_context

    br = mechanize.Browser()  # Browser

    br.set_handle_equiv(True)  # Browser options
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)

    cj = mechanize.CookieJar()  # Cookie Jar
    br.set_cookiejar(cj)

    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(),
                          max_time=1)  # Follows refresh 0 but not hangs on refresh > 0

    br.open(url_login)  # Login
    br.select_form(nr=0)
    try:
        br.form[form_username] = username                                                                            #Fill in the blank username form
        br.form[form_password] = password                                                                            #Fill in the blank password form
        br.submit()
    except:
        control = br.form.find_control(form_username)
        for item in control.items:                                                                                  #Dropdown menu username form
            if item.name == username:
                item.selected = True
        br.form[form_password] = password                                                                           #Fill in the blank password form
        br.submit()

    html = br.open(url_after_login).read()
    return html

最佳答案

Why is mechanize doing this?

Mechanize 可能不是，但浏览器是。我的猜测是该站点使用了 Mechanize 不支持的 Javascript，因此您将获得原始形式的 HTML，即执行任何 Javascript 之前的内容。

And how can I fix it?

不是 Mechanize ，但您需要一些支持 Javascript 的解决方案。见 Mechanize and Javascript了解更多信息和可能的解决方案。

关于html - Python Mechanize HTML 代码不同于 Firebug HTML 代码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39213622/

html - Python Mechanize HTML 代码不同于 Firebug HTML 代码

上一篇：python - Mechanize python脚本不再找到以前的表单字段

下一篇：使用 __doPostBack 函数的 Python Mechanize 导航