Python 抓取 ASPX 页面并登录

标签 python asp.net web-scraping

我正在尝试在这个网站( http://210.212.227.210/tkmce/index.aspx )中使用 Python 2.7 进行基本的网页抓取,其中包括登录。该页面基本上是基于 ASPX 构建的。我尝试了以下操作,但登录时出现错误。

这是主页链接( http://210.212.227.210 ),这是我登录后要请求的重定向链接( http://210.212.227.210/tkmce/Common/Home/Home.aspx )

请帮我解决这个代码。无法登录!

这些是跟踪登录时的 header 和 POST 数据。

格式数据:

__LASTFOCUS:
__EVENTTARGET:
__EVENTARGUMENT:
__VIEWSTATE:/wEPDwUKMTU4MDU0N... (its long)
__VIEWSTATEGENERATOR:2611E4BA
__EVENTVALIDATION:/wEdAAb+Owa/...
txtUserName:(login username)
txtPassword:(my login password)
hdnstatus:0
btnLogin:Login
hdnstatus0:0

请求 header :

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.9
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:460
Content-Type:application/x-www-form-urlencoded
Cookie:ASP.NET_SessionId=r3ubp0z1x5fhygqj2eqmnqig
Host:210.212.227.210
Origin:http://210.212.227.210
Referer:http://210.212.227.210/tkmce/index.aspx
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36

登录后请求 header

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.9
Cache-Control:max-age=0
Connection:keep-alive
Cookie:ASP.NET_SessionId=r3ubp0z1x5fhygqj2eqmnqig
Host:210.212.227.210
Referer:http://210.212.227.210/tkmce/index.aspx
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36

使用 BeautifulSoup 和请求的 Python 2.7 代码:

import requests
from bs4 import BeautifulSoup

URL="http://210.212.227.210/tkmce/index.aspx"
headers={"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36"}

username="myloginid"
password="myloginpassword"

s=requests.Session()
s.headers.update(headers)
r=s.get(URL)
soup=BeautifulSoup(r.content)

VIEWSTATE=soup.find(id="__VIEWSTATE")['value']
VIEWSTATEGENERATOR=soup.find(id="__VIEWSTATEGENERATOR")['value']
EVENTVALIDATION=soup.find(id="__EVENTVALIDATION")['value']
EVENTTARGET=soup.find(id="__EVENTTARGET")['value']
EVENTARGUEMENT=soup.find(id="__EVENTARGUMENT")['value']

login_data={
"__VIEWSTATE":VIEWSTATE,
"txtUserName":username,
"txtPassword":password,
"__VIEWSTATEGENERATOR" : VIEWSTATEGENERATOR,
"__EVENTVALIDATION":EVENTVALIDATION,
"__EVENTTARGET":EVENTTARGET,
"__EVENTARGUEMENT":EVENTARGUEMENT}

r = s.post(URL, data=login_data)
r = s.get("http://210.212.227.210/tkmce/Common/Home/Home.aspx")
print (r.url)
print (r.text)

最佳答案

这将帮助您登录。<​​/p>

import platform
import time
from selenium import webdriver

if platform.system() == 'Windows':
    PHANTOMJS_PATH = './phantomjs.exe'
else:
    PHANTOMJS_PATH = './phantomjs'

browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.set_window_size(1366, 768)
browser.get("http://210.212.227.210/tkmce/index.aspx")
browser.find_element_by_id("txtUserName").send_keys('170907')
browser.find_element_by_id("txtPassword").send_keys('Caffeine@9')
browser.find_element_by_id("btnLogin").click()
time.sleep(5)
html = browser.page_source
if 'Welcome' in html:
    print("You're logged in!")
else:
    print("Logging in failed. Perhaps, it was attempted with invalid credentials")

关于Python 抓取 ASPX 页面并登录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48812532/

相关文章:

c# - Asp net AJAX 在 MasterPage 中

python - 如何将 url 值传递给 Scrapy 爬取中的所有后续项目?

python - 请求响应留下了一些数据

python - 用单列中的值替换所有列值 - Pandas

javascript - 无法让 jQuery AutoComplete 与外部 JSON 一起使用

javascript - Bokeh:将变量传递给 Widgets 的 CustomJS

c# - 使用 C# asp.net 将日期时间保存到 MS SQL 时出错

php - 我正在尝试在页面上抓取带有 id 的特定 div

python - Eventlet 线程不并行运行

Python 3.4.3 json.dumps() "is not JSON serializable"从字节转换时出错