python - 使用 Python 登录 AJAX 表单

标签 python ajax mechanize web-scraping

顺便说一句...该网站仅适用于 Internet Explorer...

我正在尝试为客户废弃一个网站,以便我可以为他们自动执行任务。基本上,它会抓取不同的报告并寻找周转时间并将其通过电子邮件发送给客户。我的抓取程序工作正常,我遇到的问题是使用 Mechanize 登录网站,因为登录表单使用 AJAX。我已经四处寻找解决方案,但似乎无法准确找到我正在寻找的内容。

下面是 HTML 表单和(据我所知)处理它的 AJAX。

<p></p>

<p>function TranLogin()
         {
            var url = 'login.aspx?isAjax=true&eventTarget=TranLogin';
            var postData = Form.serialize('Form1');
            ajaxRequest = new Ajax.Request(
            url,
            {
                method : 'post',
                postBody : postData,
                onComplete : TransLoginFinished,
                onFailure : reportError,
                onException : reportException
            });
         }</p>

<p>function TransLoginFinished(serverResponse)
         {
            if (requestFailed) return;
            xmlNodes =  serverResponse.responseXML;
            usrSite = "8000";
            usrCode = decodeXmlChar(xmlNodes.getElementsByTagName('UserCode')[0].text);
            if (xmlNodes.getElementsByTagName('LoginResult')[0].text == '-1'){
                alert(decodeXmlChar(xmlNodes.getElementsByTagName('FailMsg')[0].text));
                Form.enable('Form1');
                return;
            }
            if (xmlNodes.getElementsByTagName('LoginResult')[0].text == '20'){
                window.location.replace('initpasswd.aspx?usersite=' + usrSite + '&usercode=' + usrCode);
                return;
            }
            if (xmlNodes.getElementsByTagName('LoginResult')[0].text == '14'){
                window.location.replace('chgpasswd.aspx?type=chgpwd&usersite=' + usrSite + '&usercode=' + usrCode);
                return;
            }
            if (xmlNodes.getElementsByTagName('LoginResult')[0].text == '16'){
                window.location.replace('chgpasswd.aspx?type=pwdexpire&usersite=' + usrSite + '&usercode=' + usrCode);
                return;
            }
            if (xmlNodes.getElementsByTagName('LoginResult')[0].text == '0'){
                if (xmlNodes.getElementsByTagName('PwdExpireWarning')[0].text == 'true'){
                    var changePwdNow = window.confirm(decodeXmlChar(xmlNodes.getElementsByTagName('PwdExpireMsg')[0].text));
                    if (changePwdNow == true){
                        window.location.replace('chgpasswd.aspx?type=chgpwd&usersite=' + usrSite + '&usercode=' + usrCode);
                        return;
                    }
//                    var arg = { promptMsg :decodeXmlChar(xmlNodes.getElementsByTagName('PwdExpireMsg')[0].text), 
//                                buttons : [ { value : "Yes", rtnVal : 1 },
//                                            { value : "No", rtnVal : 0 }
//                                ] 
//                    };
//                    var rtn = window.showModalDialog('../Modules/ModalMessageBox.aspx',arg, "dialogHeight:140px;dialogWidth:500px; center:1;status:no;");
//                    if (rtn && rtn == 1){
//                      window.location.replace('chgpasswd.aspx?type=chgpwd&usersite=' + usrSite + '&usercode=' + usrCode);
//                      return;
//                    }<br/>
                }
                if (JTrim($('txtHospCode').value) == '')
                {
                    hospList = decodeXmlChar(xmlNodes.getElementsByTagName('HospList')[0].text).split('|');
                    if (hospList.length < 2)
                    {
                        selectedHospCode = hospList[0].split('-')[0];
                        TranSelectHosp(selectedHospCode);
                        return;
                    }
                    $('divHospList').style.display = 'block';</p>

<pre><code>                for(i=0;i<hospList.length;i++)
                {
                    if (hospList[i] != '')
                    {
                        divHospCode = document.createElement("div");
                        divHospCode.className='divHospCode';
                        $('divHospListBG').appendChild(divHospCode);
                        lnkHospCode = document.createElement("a");
                        if (hospList[i].length <= 33)
                            lnkHospCode.innerText = hospList[i];
                        else
                            lnkHospCode.innerText = hospList[i].substr(0,30) + '...';
                        lnkHospCode.title = hospList[i];
                        lnkHospCode.className = 'lnkHospCode';
                        divHospCode.appendChild(lnkHospCode);
                        lnkHospCode.onmouseover = function(){this.style.color = '#000000';}
                        lnkHospCode.onmouseout = function(){this.style.color = '#6c6c6c';}
                        lnkHospCode.onclick = function(){TranSelectHosp(this.innerText.split('-')[0]);}
                        if (i > 7 && $('divHospListBG').style.overflow != 'auto')
                        {
                            $('divHospListBG').style.height = '198px';
                            $('divHospListBG').style.overflow = 'auto';
                        }
                    }
                }
                return;
            }
            else
            {
                TranSelectHosp(JTrim($('txtHospCode').value));
            }
        }
     }
</code></pre>

<p></p>

<form name="Form1" method="post" action="login.aspx" id="Form1">
输入账户代码:
            <div class="divRight">
                <input name="txtHospCode" type="text" id="txtHospCode" class="inputClass" maxlength="4" />
            </div>
            <div class="divLeft">
                <span>Input User Code:</span>
            </div>
            <div class="divRight">
                <input name="txtUserCode" type="text" id="txtUserCode" class="inputClass" maxlength="6" />
            </div>
            <div class="divLeft">
                <span>Input Password:</span></div>
            <div class="divRight">
                <input name="txtPassword" type="password" id="txtPassword" class="inputClass" />
            </div>
            <div class="divLeft">
                <span>Login As:</span>
            </div>
            <div class="divRight">
                &nbsp;<input type="radio" name="rdLoginType" value="D" checked="checked" />Doctor&nbsp;&nbsp;
                <input type="radio" name="rdLoginType" value="T" />Other
            </div>
            <div class="divLeft">
            </div>
            <div class="divRight">
                <input class="buttonClass" id="btnOK" type="button" value="Enter" onclick="LoginIn();" />
                <input class="buttonClass" id="btnReset" type="button" value="Reset" onclick="ResetInput();" />
            </div>

到目前为止我的代码


import mechanize
import cookielib
from BeautifulSoup import BeautifulSoup
import html2text
import re<p></p>

<h1>Instatiate Browser</h1>

<p>br = mechanize.Browser()</p>

<h1>Cookie Jar</h1>

<p>cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)</p>

<h1>Browser options</h1>

<p>br.set_handle_equiv(True)</p>

<h1>br.set_handle_gzip(True)</h1>

<p>br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)</p>

<h1>Follows refresh 0 but not hangs on refresh > 0</h1>

<p>br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)</p>

<h1>User-Agent</h1>

<p>br.addheaders = [('User-agent', 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 6.0)')]</p>

<p>def login_to_website(login_url, login_form_name, usr_form_name, pwd_form_name,acct_code_name, usr, pwd, acct_code):
    """ Logs user into website """</p>

<pre><code># Open the url of the login page
br.open(login_url)


# Select the login form name
br.select_form(login_form_name)

# Enter user's credentials into the form
br.form[acct_code_name] = acct_code
br.form[usr_form_name] = usr
br.form[pwd_form_name] = pwd
br.find_control(name='rdLoginType').value = ['T']

# Submit the form
print "Logging in as:", usr 
br.submit()

# print current url
print "We are now at:", br.geturl()

# print error
if br.geturl() == login_url:
    print "Login Failed"
else: print "Successfully logged in"
</code></pre>

<p>login_to_website('https://www.website.com', 'Form1', 'txtUserCode', 'txtPassword','txtHospCode', usr, pwd, acctCode)</p>

<p></p>

最佳答案

据我所知,Mechanize 不处理 Javascript。所以你的选择是,按照我尝试的大致顺序:

  • 关闭浏览器中的 Javascript,看看您是否仍然可以登录该网站。如果是这样,请尝试在该过程中使用 Mechanize 。
  • 尝试弄清楚 AJAX 表单的效果是什么(服务器端和客户端),并尝试使用 Python 来模拟它。如果您还没有找到它,像 Firebug 这样的工具对此非常有用。
  • 使用允许 Python 控制真实浏览器的各种库之一。我从未这样做过,但我知道至少有适用于 Firefox 和 IE 的包装器。

关于python - 使用 Python 登录 AJAX 表单,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4610502/

相关文章:

python - Mechanize Pythonmoodle登录

javascript - AJAX 请求从 HTTPS 页面到 HTTP Url

python - Python 上的错误 403 : Request disallowed by robots. txt

python - 从普通函数调用 django 模板标签

java - 如何将参数传递给 java 中的方法,如 python 中的 f(*args)?

ajax - Bootstrap Select 'refresh"继续添加新选项而不是删除旧选项

ajax - 如何将参数值传递给a4j :jsFunction

python - 如何使用 spaCy 处理 python 生成器的异常

python - 为什么我在表中添加列后,vertica 没有摄取数据?