python-3.x - 网络驱动程序异常 : Message: unknown error: bad inspector message error while printing HTML content using ChromeDriver Chrome through Selenium Python

标签 python-3.x selenium google-chrome web-scraping selenium-chromedriver

我正在抓取一些 HTML 内容..

for i, c in enumerate(cards[75:77]):
    print(i)
    a = c.find_element_by_class_name("influencer-stagename")
    print(a.get_attribute('innerHTML'))

适用于除第 76 条记录之外的所有记录。错误前的输出...

0
b'<a class="influencer-analytics-link" href="/influencers/sophiewilling"><h5><span>SOPHIE WILLING</span></h5></a>'
1
b'<a class="influencer-analytics-link" href="/influencers/ferntaylorr"><h5><span>Fern Taylor.</span></h5></a>'
2
b'<a class="influencer-analytics-link" href="/influencers/officialshaniceslatter"><h5><span>Shanice Slatter</span></h5></a>'
3

堆栈跟踪...

> -------------------------------------------------------------------------
WebDriverException                        Traceback (most recent call last) <ipython-input-484-0a80d1af1568> in <module>
          3     #print(c.find_element_by_class_name("influencer-stagename").text)
          4     a = c.find_element_by_class_name("influencer-stagename")
    ----> 5     print(a.get_attribute('innerHTML').encode('ascii', 'ignore'))

    ~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in get_attribute(self, name)
        141                 self, name)
        142         else:
    --> 143             resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
        144             attributeValue = resp.get('value')
        145             if attributeValue is not None:

    ~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/webelement.py in _execute(self, command, params)
        631             params = {}
        632         params['id'] = self._id
    --> 633         return self._parent.execute(command, params)
        634 
        635     def find_element(self, by=By.ID, value=None):

    ~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
        319         response = self.command_executor.execute(driver_command, params)
        320         if response:
    --> 321             self.error_handler.check_response(response)
        322             response['value'] = self._unwrap_value(
        323                 response.get('value', None))

    ~/anaconda3/envs/py3-env/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
        240                 alert_text = value['alert'].get('text')
        241             raise exception_class(message, screen, stacktrace, alert_text)
    --> 242         raise exception_class(message, screen, stacktrace)
        243 
        244     def _value_or_default(self, obj, key, default):

    WebDriverException: Message: unknown error: bad inspector message: {"id":110297,"result":{"result":{"type":"object","value":{"status":0,"value":"<a class=\"influencer-analytics-link\" href=\"/influencers/bookishemily\"><h5><span>Emily | 18 | GB | Student\uD83C...</span></h5></a>"}}}}   (Session info: chrome=75.0.3770.100)   (Driver info: chromedriver=2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363),platform=Mac OS X 10.14.0 x86_64)

我怀疑这是一个无效的字符

value":"Emily | 18 | GB | Student\uD83C..."

具体来说我怀疑是“\uD83C”

添加

.encode("utf-8")  OR   .encode('ascii', 'ignore')

第二个打印语句没有任何变化。

关于如何解决这个问题有什么想法吗??

更新:问题出在表情符号字符上。到目前为止,我已经找到了 3 个示例,每个示例都有一个表情符号(粉红色的花 🌸、俄罗斯国旗 🇷🇺 和旋转的树叶 🍃)。如果我用 Chrome 检查器编辑它们,我的代码运行正常,但这不是一个大规模工作的解决方案

最佳答案

这个错误信息...

WebDriverException: Message: unknown error: bad inspector message: {"id":110297,"result":{"result":{"type":"object","value":{"status":0,"value":"<a class=\"influencer-analytics-link\" href=\"/influencers/bookishemily\"><h5><span>Emily | 18 | GB | Student\uD83C...</span></h5></a>"}}}}   (Session info: chrome=75.0.3770.100)   (Driver info: chromedriver=2.40.565386 (45a059dc425e08165f9a10324bd1380cc13ca363),platform=Mac OS X 10.14.0 x86_64)

...暗示 ChromeDriver 由于 JSON 编码/解码问题无法解析某些非 UTF-8 字符。


深入探讨

根据 Issue 723592: 'Bad inspector message' errors when running URL web-platform-tests via webdriver 中的讨论John Chen(所有者 - Google Chrome 的 WebDriver)在他的 comment 中提到:

A JSON encoding/decoding issue caused the "Bad inspector message" error reported at https://travis-ci.org/w3c/web-platform-tests/jobs/232845351. Part of the error message from part 1 contains an invalid Unicode character \uFDD0 (from https://github.com/w3c/web-platform-tests/blob/34435a4/url/urltestdata.json#L3596). The JSON encoder inside Chrome didn't detect such error, and passed it through in the JSON blob sent to ChromeDriver. ChromeDriver uses base/json/json_parser.cc to parse the JSON string. This parser does a more thorough error detection, notices that \uFDD0 is an invalid character, and reports an error. I think our JSON encoder and decoder should have exactly the same amount of error checking. It's problematic that the encoder can create a blob that is rejected by the decoder.


分析

John Chen(所有者 - Google Chrome 的 WebDriver)进一步 added :

The JSON encoding happens in protocol layout of DevTools, just before the result is sent back to ChromeDriver. The relevant code is in https://cs.chromium.org/chromium/src/out/Debug/gen/v8/src/inspector/protocol/Protocol.cpp. In particular, escapeStringForJSON function is responsible for encoding strings. It's actually quite conservative. Anything above 126 is encoded in \uXXXX format. (Note that Protocol.cpp is a generated file. The real source is https://cs.chromium.org/chromium/src/v8/third_party/inspector_protocol/lib/Values_cpp.template.)

The error occurs in the JSON parser used by ChromeDriver. The decoding of \uXXXX sequence happens at https://cs.chromium.org/chromium/src/base/json/json_parser.cc?l=564 and https://cs.chromium.org/chromium/src/base/json/json_parser.cc?l=670. After decoding an escape sequence, the decoder rejects anything that's not a valid Unicode character.

I noticed that there was a recent change to prevent a JSON encoder from emitting invalid Unicode code point: https://crrev.com/478900. Unfortunately it's not the JSON encoder used by the code involved in this bug, so it doesn't help us directly, but it's an indication that we're not the only ones affected by this type of issue.


解决方案

此问题已解决在解码 chromedriver 中的无效 UTF 字符串时替换无效的 UTF-16 转义序列,因为 Web 平台测试可能使用不一定是 utf-16 字符的 ECMAScript 字符串通过此 revision/commit .

因此,一个快速的解决方案是确保以下内容并重新执行您的测试:


备选

作为替代方案,您可以使用 GeckoDriver/Firefox 组合,您可以在 Chromedriver only supports characters in the BMP error while sending Emoji with ChromeDriver Chrome using Selenium Python to Tkinter's label() textbox 中找到相关讨论。

关于python-3.x - 网络驱动程序异常 : Message: unknown error: bad inspector message error while printing HTML content using ChromeDriver Chrome through Selenium Python,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56880164/

相关文章:

javascript - 我可以在扩展程序中使用 JavaScript 来退出 chrome 吗?

html - Flexbox 在 Firefox 和 Chrome 中的不同实现

python - 为什么裸 Python 装饰器(没有@)不会产生编译器错误?

javascript - Selenium 将哈希添加到 findElement

javascript - 将事件监听器放置在单独的 cookie 上

javascript - webdriver node js在类标签下获取多个链接

selenium - 使用 selenium : chrome screensharing auto select tab and share with audio 进行测试

python-3.x - 是否有覆盖 __hash__ 的用例?

python-3.x - 如何使用 Amazon S3 下载 Landsat 8 图像

Python:如何计算年份之间按月产生的收入?