Python3.x如何在进程之间共享一个数据库连接?

标签 python mysql python-3.x multiprocessing

我正在使用 multiprocessing.Pool 运行多个进程

每个进程都必须查询我的 mysql 数据库。

我目前连接到数据库一次,然后在进程之间共享连接

它有效,但偶尔会出现奇怪的错误。我已经确认错误是在查询数据库时引起的。

我认为问题是因为所有进程都使用了相同的连接。

  • 这是正确的吗?

当我寻找答案时,我偶然发现了这个问答 How to share a single MySQL database connection between multiple processes in Python

所以我查找了 Class pooling.MySQLConnectionPool

如果我明白这一点。我将设置一个具有多个连接的池并在进程之间共享该池。然后每个进程将查看该池,如果连接可用,则使用它,否则等待直到连接被释放。

  • 这是正确的吗?

但后来我发现了这个问答 Accessing a MySQL connection pool from Python multiprocessing

“mata”似乎首先证实了我的怀疑,但同时他驳回了设置池以在进程之间共享的用途

sharing a database connection (or connection pool) between different processes would be a bad idea (and i highly doubt it would even work correctly),

相反,他建议

so each process using it's own connections is actually what you should aim for.

这是什么意思?

  • 我应该为每个工作人员创建一个连接吗? 那么 mysql 池有什么用呢?

mata 在他的回答中给出的例子似乎足够合理,但我不明白将整个池作为 init 参数传递

p = Pool(initializer=init)
  • 为什么?(正如 ph_singer 在评论中指出的,这不是一个好的解决方案)

将阻塞的 Pool.map() 方法更改为 Pool.map_async() 并将连接从池发送到 map_async(q, ConnObj) 应该就足够了吗?

  • 这是正确的吗?

在评论中提到

The only way of utilizing one single pool with many processes is having one dedicated process which does all the db access communicate with it using a queue

更新发现这个。似乎同意:https://stackoverflow.com/a/26072257/1267259

If you need large numbers of concurrent workers, but they're not using the DB all the time, you should have a group of database worker processes that handle all database access and exchange data with your other worker processes. Each database worker process has a DB connection. The other processes only talk to the database via your database workers.

Python's multiprocessing queues, fifos, etc offer appropriate messaging features for that.

  • 这真的是正确的吗?

mysql 池的目的不是处理进程的请求并将它们中继到可用连接吗?

现在我只是很困惑......

最佳答案

找到 Share connection to postgres db across processes in Python
我的第一个问题的答案似乎是

You can't sanely share a DB connection across processes like that. You can sort-of share a connection between threads, but only if you make sure the connection is only used by one thread at a time. That won't work between processes because there's client-side state for the connection stored in the client's address space.

我剩下的问题的答案基本上归结为您支持以下哪些陈述(来自本问答中评论中的讨论)

Basically, the idea is to create a connection pool in the main process, and then in each spawned thread/process, you request connections from that pool. Threads should not share the same identical connection, because then threads can block each other from one of the major activities that threading is supposed to help with: IO. – Mr. F

既不将池或池中的连接传递给子进程

Each child process creates its own db connections if it needs them (either individually or as a pool) – J.F. Sebastian.

"why use [db connections] pool" -- if there are multiple threads in your worker process then the pool might be useful (several threads can read/write data in parallel (CPython can release GIL during I/O)). If there is only one thread per worker process then there is no point to use the db pool. – J.F. Sebastian


作为旁注

这并没有完全回答我的第三个问题,但它确实提出了在某些情况下为每个进程创建一个连接是可行的 (Share connection to postgres db across processes in Python)

It's unclear what you're looking for here. 5 connections certainly isn't an issue. Are you saying you may eventually need to spawn 100s or 1000s of processes, each with their own connection? If so, even if you could share them, they'd be bound to the connection pool, since only one process could use a given connection at any given time. – khampson Sep 27 '14 at 5:19

关于Python3.x如何在进程之间共享一个数据库连接?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28638939/

相关文章:

Python 使用 pandas 将 xlsx 转换为 csv 文件。如何删除索引列?

php - 在 Codeigniter 上的 php 中查询

php - Laravel 按与用户的距离搜索

php - 如何设置全局 group_concat_max_len

python长数数据丢失

python - 索引和切片。检查是否有任何项目出现两次

python - 如何在 Pandas 中使用具有多索引的 map ?

python - 验证 pyspark dataframe 中列的数据类型

python - 一旦搜索到列表中的字符串,有什么方法可以获取列表的名称吗?

python - 在函数调用之间保存数据的 pythonic 方式是什么?