python - 使用 python 从网站上抓取 socket.io 数据

标签 python websocket socket.io scrape

我正在尝试从该网站“http://btc-exchange.com/”抓取最新的出价和要价 我可以看到价格是由这个 socket.io 提供的

wss://pusher.mistertango.com/socket.io/?EIO=3&transport=websocket&sid=XXX

sessionID 正在从此调用中生成

https://pusher.mistertango.com/socket.io/?EIO=3&transport=polling&t=1517079662330-10

这是我当前正在使用的代码

import requests
from websocket import create_connection
import json

SID_url = "https://pusher.mistertango.com/socket.io/?EIO=3&transport=polling"
SID_req = requests.get(SID_url, headers={'User-Agent': 'Mozilla/5.0'}).text
SID = SID_req[SID_req.index("sid")+6:SID_req.index(",")-1]
print(SID_req)
print(SID)

ws = create_connection("wss://pusher.mistertango.com/socket.io/?EIO=3&transport=websocket&sid="+SID)
ws.send('2probe')
print(ws.recv())
ws.send('5')
print(ws.recv())
ws.send('42["subscribe",{"chan":"market-e559906eda4362f58bcaab40a4bfb5b4"}]')
while True:
    result = ws.recv()
    print(result)
ws.close()

这是代码的输出

    ÿ0{"sid":"mURV8OnaNqax_AmvAAF2","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}
mURV8OnaNqax_AmvAAF2
3probe
40

我发送的消息是基于我在 chrome-dev 工具中看到的消息。 Websocket msgs

我的连接在“40”之后停止接收任何消息。我做错了什么?

最佳答案

这里有你想要的代码。已测试并有效

#!/usr/bin/env python
from socketIO_client import SocketIO

def on_connect():
    print('connect')

def on_disconnect():
    print('disconnect')

def on_reconnect():
    print('reconnect')

def on_aaa_response(*args):
    print('on_aaa_response', args)

def on_bbb_response(*args):
    print('on_bbb_response', args)

with SocketIO('https://pusher.mistertango.com') as socketIO:
    socketIO.on('connect', on_connect)
    socketIO.on('disconnect', on_disconnect)
    socketIO.on('reconnect', on_reconnect)
    socketIO.on('market-orderbook', on_aaa_response)
    socketIO.emit('subscribe', {'chan': 'market-orderbook'}, on_bbb_response)
    socketIO.wait_for_callbacks(seconds=10)

关于python - 使用 python 从网站上抓取 socket.io 数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48481246/

相关文章:

python - Tornado 非阻塞请求

python - 在子进程Popen中使用python

java - 如何从predix通过WebSocket获取时间序列?

Node.js Express websocket 不向所有连接的客户端广播

javascript - NodeJS、SocketIO 和 Express 逻辑上下文构建

Laravel-echo-server , net::ERR_CONNECTION_TIMED_OUT

python - 为数值积分提供路径点

python - 用python浪费cpu周期

javascript - 控制台错误 NodeJS 配置

Laravel Echo/Socket.io 在 sleep 后保持连接活跃