python - 如何通过selenium RC保存网页

我使用seleniumRC打开一个url，那么如何保存这个网页呢？如何实现像urllib.urlretrieve那样呢？但urllib无法操作页面中的javascript。还有一个问题:它会保存我所看到的 seleniumRC 打开的整个页面吗？

最佳答案

听起来您混淆了两个截然不同的库。

urllib :

This module provides a high-level interface for fetching data across the World Wide Web. In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames.

您可以使用 python 的 urllib 库从有效 URL 检索原始标记。该库不会调用页面上的任何嵌入式 JavaScript，因为该库从不尝试解析或呈现任何内容。

Selenium RC :

Selenium Remote Control (RC) is a test tool that allows you to write automated web application UI tests in any programming language against any HTTP website using any mainstream JavaScript-enabled browser.

Selenium RC 用于自动化测试。测试的执行通过 JavaScript 在 Web 浏览器中进行，但这是一个测试套件 — 您会收到有关测试状态的信息。 Selenium RC 不提供任何保存渲染页面图像的功能。

<小时/>

除非我误解了你的问题，否则你似乎正在寻找一个库，它允许你检索渲染的 HTML 页面的图像(包括 javascript DOM 操作)。如果确实如此，我建议查看 PyWebShot ，它似乎正好提供了该功能。您可以查看它的运行截图here (以及一些有关它的附加信息)。

如果它不一定需要是一个 python 库，有许多 Web 服务可以提供屏幕截图:

关于python - 如何通过selenium RC保存网页，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3456852/

python - 如何通过selenium RC保存网页

上一篇：python - Django:通过通用名称访问子不同子类的方法

下一篇：Python subprocess.Popen 通过管道进行通信