Python Playwrite 内存过载

标签 python web-scraping playwright playwright-python

我编写了一个代码,连续抓取网站,几次后收到此消息

<--- Last few GCs --->

[17744:00000270608DE2C0] 16122001 ms: Scavenge 2023.5 (2082.0) ->
2017.3 (2082.5) MB, 3.6 / 0.1 ms  (average mu = 0.908, current mu = 0.941) task [17744:00000270608DE2C0] 16122645 ms: Scavenge 2023.9 (2082.5) -> 2017.7 (2083.0) MB, 3.5 / 0.0 ms  (average mu = 0.908, current mu = 0.941) task  [17744:00000270608DE2C0] 16128334 ms: Scavenge 2024.1 (2083.0) -> 2017.7 (2099.0) MB, 4.7 / 0.0 ms  (average mu = 0.908, current mu = 0.941) task 


<--- JS stacktrace --->

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory  1: 00007FF66A07013F v8::internal::CodeObjectRegistry::~CodeObjectRegistry+112495  2: 00007FF669FFF396 DSA_meth_get_flags+65526  3: 00007FF66A00024D node::OnFatalError+301  4: 00007FF66A9319EE v8::Isolate::ReportExternalAllocationLimitReached+94  5: 00007FF66A91BECD v8::SharedArrayBuffer::Externalize+781  6: 00007FF66A7BF61C v8::internal::Heap::EphemeronKeyWriteBarrierFromCode+1468  7: 00007FF66A7BC754 v8::internal::Heap::CollectGarbage+4244  8: 00007FF66A76C3B5 v8::internal::IndexGenerator::~IndexGenerator+22165  9: 00007FF669F90E9F v8::CFunctionInfo::HasOptions+22111 10: 00007FF669F8F6B6 v8::CFunctionInfo::HasOptions+15990 11: 00007FF66A0CF25B uv_async_send+331 12: 00007FF66A0CE9EC uv_loop_init+1292 13: 00007FF66A0CEB8A uv_run+202 14: 00007FF66A09DC95 node::SpinEventLoop+309 15: 00007FF669FB7AC3 cppgc::internal::NormalPageSpace::linear_allocation_buffer+53827 16: 00007FF66A034FBD node::Start+221 17: 00007FF669E588CC RC4_options+348108 18: 00007FF66AEB08F8 v8::internal::compiler::RepresentationChanger::Uint32OverflowOperatorFor+14472 19: 00007FFEB62C7034 BaseThreadInitThunk+20 20: 00007FFEB78A2651 RtlUserThreadStart+33

之后我的代码停止工作。 有遇到过这个问题的人知道如何解决吗? 我正在使用 python 3.8.8 和 playwright 1.22.0

我导入了这个库来制作网页

    from playwright.sync_api import sync_playwright

谢谢大家!

最佳答案

对于 2023 年第一季度,这可能是最好的回应:https://github.com/microsoft/playwright/issues/6319#issuecomment-1227405461

Save the browser's state to a local file (session, local storage, etc) after creating the browser/context and performing the actions required to meet your needs:

context.StorageState("state.json")

Close browser, context and kill all node.exe processes every 30 minutes. (this is where the memory leak exists for me), if you don't kill them it creates a separate node.exe process every time. The previous process remains in memory taking up space.

Create new browser/context and load in the saved state.. navigate back to where you need to be. context, err := browser.NewContext( playwright.BrowserNewContextOptions{ StorageStatePath: playwright.String("state.json"), })

如果 Playwright 出现内存问题,请阅读整期,也许您会找到一些灵感: https://github.com/microsoft/playwright/issues/6319

关于Python Playwrite 内存过载,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72954376/

相关文章:

python - Xpath提取多个节点之间的所有文本?

javascript - Playwright - 在测试运行之间收到错误 'locator.click: Target page, context or browser has been closed'

npm - 剧作家给出 "npm ERR! could not determine executable to run"失败

Python:如何让 Gtk.scrolledwindow 滚动到 Gtk.Treeview 中的选择

python - 如何在运行时就地更改类实例行为?

javascript - 可靠地抓取股价表

angular - Playright 使用 global-setup.ts 缓存 session 超时登录到应用程序,等待 DEBUG=0 的选择器,而 DEBUG=1 则不会超时

python - wxPython 仪表在 Linux 上卡住

python - 热图中的标签组

python - Python 网页抓取时的编码问题