当某些线程创建 Webdriver 时 Python Selenium 失败

我有一个线程，它接受一个 URL，在 selenium 中请求它并解析数据。

大多数时候该线程工作正常。但有时它似乎卡在创建网络驱动程序上，我似乎无法异常处理它。

这是线程的开始:

def GetLink(eachlink):

    trry = 0 #10 Attempts at getting the data

    while trry < 10:

        print "Scraping:  ", eachlink
        try:

            Numbergrab = []
            Namegrab = []
            Positiongrab = []

            nextproxy = (random.choice(ProxyList))
            nextuseragent = (random.choice(UseragentsList))
            proxywrite = '--proxy=',nextproxy
            service_args = [
            proxywrite,
            '--proxy-type=http',
            '--ignore-ssl-errors=true',
            ]

            dcap = dict(DesiredCapabilities.PHANTOMJS)
            dcap["phantomjs.page.settings.userAgent"] = (nextuseragent)
            pDriver = webdriver.PhantomJS('C:\phantomjs.exe',desired_capabilities=dcap, service_args=service_args)
            pDriver.set_window_size(1024, 768) # optional
            pDriver.set_page_load_timeout(20)

            print "Requesting link: ", eachlink
            pDriver.get(eachlink)
            try:
                WebDriverWait(pDriver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[@class='seat-setting']")))
            except:
                time.sleep(10)

这是一个片段，但这是重要的部分，因为当它工作时，它会继续正常工作。

但是当某些事情停止时，其中一个线程将向控制台发送“抓取:链接”，而不是向控制台发送“请求链接:链接”。

这意味着在实际设置网络驱动程序时线程处于停滞状态。据我所知，这是线程安全的，我尝试使用 lock.aquire 并从一批 20 个 .exe 中随机给它一个，结果相同。

有时线程会正常工作，然后突然停止而无法发出请求。

更新:

有时，当我关闭控制台时，它会告诉我有一个套接字错误。您可以在该片段中看到尝试的开始，我在最后有这样的内容:

except:
                trry +=1
                e = sys.exc_info()[0]
                print "Problem scraping link: ", e

但它会很高兴地坐在那里几个小时，什么也不说，直到我物理关闭控制台。然后它会弹出 socket.error 并打印已死亡线程的“scraping: link”消息。

这实际上表明它在启动 while 之前就失败了，但是 try 在该线程开始时被设置为 0，并且在其他任何地方都没有引用。另外，如果没有 selenium webdriver，就不会出现 socket.error，所以它也一定会阻止之前的消息。

更新#2:

当运行完全相同的代码的单个线程时，看起来它很乐意运行几个小时。

但是线程锁并没有什么区别。

有点难住了。将尝试使用子进程而不是线程来看看它的作用。

更新#3:

线程并不稳定，但子处理却稳定。好的Python。

最佳答案

我在多线程和多处理以及使用 Firefox、Chrome 或 PhantomJS 时都遇到过这种情况。无论出于何种原因，实例化浏览器的调用(例如 driver = webdriver.Chrome())永远不会返回。

我的大多数脚本的生命周期相对较短，线程/进程很少，因此问题并不常见。然而，我有一些脚本将运行几个小时并创建和销毁数百个浏览器对象，并且我保证每次运行都会经历几次挂起。

我的解决方案是将浏览器实例化放入其自己的函数/方法中，然后使用 PyPI 提供的众多超时和重试装饰器之一来装饰该函数/方法:

(未经测试)

from retrying import retry
from selenium import webdriver
from timeoutcontext import timeout, TimeoutException


def retry_if_timeoutexception(exception):
    return isinstance(exception, TimeoutException)


@retry(retry_on_exception=retry_if_timeoutexception, stop_max_attempt_number=3)
@timeout(30)  # Allow the function 30 seconds to create and return the object
def get_browser():
    return webdriver.Chrome()

https://pypi.python.org/pypi/retrying

https://pypi.python.org/pypi/timeoutcontext

关于当某些线程创建 Webdriver 时 Python Selenium 失败，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37587942/

当某些线程创建 Webdriver 时 Python Selenium 失败

上一篇：python - n 个数字的最小公倍数，使用递归

下一篇：python - 只修改文件的特定部分