我正在移植一个 bash 脚本,该脚本使用 curl 并将代码中的有效负载发布到 URL 并起作用。基本问题是,使用 robobrowser,我在使用页面表单发帖时遇到了麻烦。
逐步浏览网站:
- 登录/SubLogin.aspx
- 登录成功重定向/OptionsSummary.aspx
- 使用参数获取/FindMe.aspx
- POST/FindMe.aspx 按钮“Phone Lists”(然后页面应加载“Phone Lists”表,其中包含一项“Work”)
- 选择“工作”项会执行到/PhoneLists.aspx 的 POST(然后应加载包含用户列表的“工作”表。
不过,我已经能够成功地通过该站点的身份验证并使用 RoboBrowser 和 Requests+bs4 执行 GET 我对 POST 回到页面本身感到困惑。
使用 RoboBrowser (liboncall.py)
#!/usr/bin/python
from robobrowser import RoboBrowser
from bs4 import BeautifulSoup as BS
oc_mailbox = '123456'
oc_password_hashed = 'ABCDEFG'
base_uri = 'http://example.com'
auth_uri = oc_base_uri + '/SubLogin.aspx'
find_uri = oc_base_uri + '/FindMe.aspx'
phne_uri = oc_base_uri + '/PhoneLists.aspx'
p_auth_payload = {
'SubLoginControl:javascriptTest': 'true',
'SubLoginControl:mailbox': mailbox,
'SubLoginControl:phoneNumber': '',
'SubLoginControl:password': password_hashed,
'SubLoginControl:btnLogOn': 'Logon',
'SubLoginControl:webLanguage': 'en-US',
'SubLoginControl:initialLanguage': 'en-US',
'SubLoginControl:errorCallBackNumber': 'Entered telephone number contains non-dialable characters.',
'SubLoginControl:cookieMailbox': 'mailbox',
'SubLoginControl:cookieCallbackNumber': 'callbackNumber',
'SubLoginControl:serverDomain': ''
}
p_find_payload = {
'FindMeControl:enableFindMe': 'on',
'FindMeControl:MasterDataControl:focusElement': '',
'FindMeControl:MasterDataControl:masterList:_ctl0:enabled': 'on',
'FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid': '',
'FindMeControl:MasterDataControl:hidSelectedScheduleName': '',
'FindMeControl:MasterDataControl:hidbtnStatus': '',
'FindMeControl:MasterDataControl:hidScheduleXML': '',
'FindMeControl:MasterDataControl:tempScheduleXML': '',
'FindMeControl:MasterDataControl:hidSelectedScheduleGUID': '',
'FindMeControl:MasterDataControl:hidChangedScheduleList': '',
'FindMeControl:btnPhoneLists': 'Phone Lists',
'FindMeControl:enableFindMeHidden': '',
'FindMeControl:applySet': 'false'
}
p_phne_payload = {
'__EVENTARGUMENT': '',
'__EVENTTARGET': 'PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton',
'PhoneListsControl:MasterDataControl:focusElement': '',
'PhoneListsControl:MasterDataControl:masterList:_ctl0:itemGuid': '',
'PhoneListsControl:MasterDataControl:hidSelectedScheduleName': '',
'PhoneListsControl:MasterDataControl:hidbtnStatus': '',
'PhoneListsControl:MasterDataControl:hidScheduleXML': '',
'PhoneListsControl:MasterDataControl:tempScheduleXML': '',
'PhoneListsControl:MasterDataControl:hidSelectedScheduleGUID': '',
'PhoneListsControl:MasterDataControl:hidChangedScheduleList': '',
'PhoneListsControl:applySet': 'false'
}
def auth(mailbox, password):
browser = RoboBrowser(history=False)
browser.open(oc_auth_uri)
signin = browser.get_form(id='aspnetForm')
signin['SubLoginControl:mailbox'].value = mailbox
signin['SubLoginControl:password'].value = password
signin['SubLoginControl:javascriptTest'].value = 'true'
signin['SubLoginControl:btnLogOn'].value = 'Logon'
signin['SubLoginControl:webLanguage'].value = 'en-US'
signin['SubLoginControl:initialLanguage'].value = 'en-US'
signin['SubLoginControl:errorCallBackNumber'].value = 'Entered+telephone+number+contains+non-dialable+characters.'
signin['SubLoginControl:cookieMailbox'].value = 'mailbox'
signin['SubLoginControl:cookieCallbackNumber'].value = 'callbackNumber'
signin['SubLoginControl:serverDomain'].value = ''
browser.submit_form(signin)
return browser
登录到站点并显示 URL 以验证我们在:
In [20]: from liboncall import *
In [21]: m = auth(oc_mailbox, oc_password_hashed)
In [22]: m.url
Out[22]: u'http://example.com/OptionsSummary.aspx'
打开“/FindMe.aspx”:
In [24]: m.open(find_uri)
In [25]: m.url
Out[25]: u'http://example.com/FindMe.aspx'
最初“/FindMe.aspx”将加载一个表单和一个按钮“电话列表”,(FindMeControl:btnPhoneLists
)。
In [26]: m.select('title')
Out[26]: [<title>Find Me</title>]
In [27]: form_find_a = m.get_form(action="FindMe.aspx")
In [28]: for i in form_find_a.keys():
print(i)
....:
__VIEWSTATE
__EVENTVALIDATION
FindMeControl:enableFindMe
FindMeControl:MasterDataControl:focusElement
FindMeControl:MasterDataControl:masterList:_ctl0:enabled
FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid
FindMeControl:MasterDataControl:btnAdd
FindMeControl:MasterDataControl:btnDelete
FindMeControl:MasterDataControl:btnRename
FindMeControl:MasterDataControl:btnCancel
FindMeControl:MasterDataControl:btnEnter
FindMeControl:MasterDataControl:btnUpdate
FindMeControl:MasterDataControl:hidSelectedScheduleName
FindMeControl:MasterDataControl:hidbtnStatus
FindMeControl:MasterDataControl:hidScheduleXML
FindMeControl:MasterDataControl:tempScheduleXML
FindMeControl:MasterDataControl:hidSelectedScheduleGUID
FindMeControl:MasterDataControl:hidChangedScheduleList
FindMeControl:btnApply
FindMeControl:btnSchedules
FindMeControl:btnPhoneLists
FindMeControl:enableFindMeHidden
FindMeControl:applySet
删除不需要的表单域,填写表单并提交:
In [29]: find_remove = (
'FindMeControl:MasterDataControl:btnAdd',
'FindMeControl:MasterDataControl:btnDelete',
'FindMeControl:MasterDataControl:btnRename',
'FindMeControl:MasterDataControl:btnCancel',
'FindMeControl:MasterDataControl:btnEnter',
'FindMeControl:MasterDataControl:btnUpdate',
'FindMeControl:btnApply',
'FindMeControl:btnSchedules')
In [30]: for i in find_remove:
form_find_a.fields.pop(i)
In [31]: form_find_a['FindMeControl:enableFindMe'].value = 'on'
form_find_a['FindMeControl:MasterDataControl:focusElement'].value = ''
form_find_a['FindMeControl:MasterDataControl:masterList:_ctl0:enabled'].value = 'on'
form_find_a['FindMeControl:MasterDataControl:masterList:_ctl0:itemGuid'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidSelectedScheduleName'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidbtnStatus'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidScheduleXML'].value = ''
form_find_a['FindMeControl:MasterDataControl:tempScheduleXML'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidSelectedScheduleGUID'].value = ''
form_find_a['FindMeControl:MasterDataControl:hidChangedScheduleList'].value = ''
form_find_a['FindMeControl:btnPhoneLists'].value = 'Phone Lists'
form_find_a['FindMeControl:enableFindMeHidden'].value = ''
form_find_a['FindMeControl:applySet'].value = 'false'
Out [31]: ...
In [32]: m.submit_form(form_find_a)
验证页面已更新并具有列表项“Work”:
In [33]: m.parsed.find('title')
Out[33]: <title>Phone Lists</title>
In [34]: m.parsed.find('a', id='PhoneListsControl_MasterDataControl_masterList__ctl0_SelectButton')
Out[34]: <a class="linkButtonItem" href="javascript:__doPostBack('PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton','')" id="PhoneListsControl_MasterDataControl_masterList__ctl0_SelectButton" onclick="javascript:onClick();">Work</a>
获取“PhoneLists.aspx”表单,删除不需要的字段,填写并提交。
In [35]: form_find_b = m.get_form(action='PhoneLists.aspx')
In [36]: phne_remove = (
'PhoneListsControl:MasterDataControl:btnAdd',
'PhoneListsControl:MasterDataControl:btnDelete',
'PhoneListsControl:MasterDataControl:btnRename',
'PhoneListsControl:MasterDataControl:btnCancel',
'PhoneListsControl:MasterDataControl:btnEnter',
'PhoneListsControl:MasterDataControl:btnUpdate',
'PhoneListsControl:btnApply',
'PhoneListsControl:btnBack')
In [37]: for i in phne_remove:
form_find_b.fields.pop(i)
In [38]: form_find_b['PhoneListsControl:MasterDataControl:focusElement'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidChangedScheduleList'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidScheduleXML'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidSelectedScheduleGUID'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidSelectedScheduleName'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:hidbtnStatus'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:masterList:_ctl0:itemGuid'].value = ''
form_find_b['PhoneListsControl:MasterDataControl:tempScheduleXML'].value = ''
form_find_b['PhoneListsControl:applySet'].value = 'false'
In [39]: m.submit_form(form_find_b)
查看帖子以查看是否已加载用户列表。在这种情况下,它没有加载 用户列表。
In [40]: m.parsed.findAll('div', id='PhoneListsControl_phoneListMembersText')
Out[41]: [<div class="displayText" id="PhoneListsControl_phoneListMembersText"></div>]
如果它是成功的上面会返回:
<div id="PhoneListsControl_phoneListMembersText" class="displayText" style="top: 315px; left: 281px;"> Work </div>
连同表格中的以下项目,(PhoneListsControl_phoneListDetail
):
<input name="PhoneListsControl:phoneListDetail:_ctl2:number" type="text" value="95551234567" maxlength="50" id="PhoneListsControl_phoneListDetail__ctl2_number" onkeyup="enableApplyButton('PhoneListsControl_')" style="width:140px;">
...
<input name="PhoneListsControl:phoneListDetail:_ctl3:number" type="text" value="95551236789" maxlength="50" id="PhoneListsControl_phoneListDetail__ctl2_number" onkeyup="enableApplyButton('PhoneListsControl_')" style="width:140px;">
...
在这次冒险中,我发现 Robobrowser 并未包括所有必需的
发布到“PhoneLists.aspx”的表单数据按预期工作,('__EVENTTARGET':'PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton'
和 __EVENTARGUMENT
)。设置参数然后执行 submit_form(form_find_b)
也没有达到预期的结果。我想知道 robobrowser.forms.form
中的 add_field()
是否有效,但我不明白如何正确使用它(如果要使用它的话)如我所愿。例如,将 __EVENTTARGET
和 __EVENTARGUMENT
隐藏输入字段添加到表单)。
还有什么我遗漏的吗?或者 RoboBrowser/Requests 不支持这种类型的帖子吗? 是不是表单需要 javascript 来执行,如提到的 here 和机械化?
最佳答案
已解决
经过多次谷歌搜索后,重新发布关于 reddit 的帮助,然后随机遇到 this RoboBrowser 问题,该问题向我展示了如何正确使用“fields.add_field()”方法;问题解决了。
例如
b_e_arg = robobrowser.forms.fields.Input('\<input name="__EVENTARGUMENT" value="" \/\>')
b_e_target = robobrowser.forms.fields.Input('\<input name="__EVENTTARGET" value="PhoneListsControl$MasterDataControl$masterList$_ctl0$SelectButton" \/\>')
In [30]: form_find_b.add_field(b_e_target)
In [31]: form_find_b.add_field(b_e_arg)
使用这些值更新表单后,提交到“PhoneLists.aspx”的表单将按预期工作。
In [33]: m.submit_form(form_find_b)
In [34]: m.url
Out[34]: u'http://example/PhoneLists.aspx'
In [35]: m.parsed.findAll('div', id='PhoneListsControl_phoneListMembersText')
Out[35]: [<div class="displayText" id="PhoneListsControl_phoneListMembersText"> Work </div>]
In [36]: m.parsed.findAll('input', id='PhoneListsControl_phoneListDetail__ctl2_number')
Out[36]: [<input id="PhoneListsControl_phoneListDetail__ctl2_number" maxlength="50" name="PhoneListsControl:phoneListDetail:_ctl2:number" onkeyup="enableApplyButton('PhoneListsControl_')" type="text" value="95551234567"/>]
我希望必须抓取 ASPX 站点的任何其他人发现这很有用。祝大家黑客愉快!
关于javascript - Python - 请求/RoboBrowser - ASPX POST JavaScript,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27681731/