python - 如何从源代码中没有显示标签的网站中抓取内容？

我正在使用 lxml 和 python 解析网站。问题是，当我通过 Mozilla FireFox 中的 Firebug 扩展检查该元素时，我能够看到该元素。但它显示我正在阅读的页面源代码中不存在代码是

import urllib
from lxml import etree
page=urllib.urlopen(url)
response=page.read()
x=etree.HTML(response)
company=x.xpath('//div[@class="name"]')

所有带有 class="name"的 div 标签在通过 Mozilla Firebug 扩展进行检查中都清晰可见。但 HTML 页面源代码中不存在

提前致谢

最佳答案

具有 class="name" 的

div 元素通过一组 XHR 调用加载。无需手动确定需要发出哪些请求才能获取数据，而是使用 AngelList API .

此外，根据Terms of Use ，网络爬虫是非法的:

Crawling the Service is permissible in accordance with this agreement, but scraping the Service without the prior consent of AngelList except as permitted by this agreement is expressly prohibited

关于python - 如何从源代码中没有显示标签的网站中抓取内容？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23960447/

上一篇：python - 如何提交包含多个对象的表单 - django

下一篇：Python/wxPython : How to get text display in a second frame from the main frame

相关文章：

Python/tox 将依赖项安装为可编辑

java - 在 Appium 中使用 Xpath 查找元素的父元素

python - 如何使用 Python 快速抓取多个 HTML 文档？

python - 如何获取 LaTeX 文件中的所有 `\begin{definition}...\end{definition}` block ？

python - hadoop 得到 'No such file or directory'

php - simpleXML:解析 XML 以仅输出元素属性

xml - xpath - if else 结构

javascript - 抓取: Get Link that is only visible on the website not in the html

python - 如何修改 Pandas 的 Read_html 用户代理？

python - 使用 'self' 调用类方法？