python - 在多个 python 脚本之间共享变量(文件中的数据)，且未加载重复项

我想加载matrix_file.mtx中包含的一个大矩阵。该负载必须进行一次。一旦变量矩阵被加载到内存中，我希望许多Python脚本能够不重复地共享它，以便在bash(或Python本身)中拥有一个内存高效的多脚本程序。我可以想象一些像这样的伪代码:

# Loading and sharing script:
import share
matrix = open("matrix_file.mtx","r")
share.send_to_shared_ram(matrix, as_variable('matrix'))

# Shared matrix variable processing script_1
import share
pointer_to_matrix = share.share_variable_from_ram('matrix')
type(pointer_to_matrix)
# output: <type 'numpy.ndarray'>

# Shared matrix variable processing script_2
import share
pointer_to_matrix = share.share_variable_from_ram('matrix')
type(pointer_to_matrix)
# output: <type 'numpy.ndarray'>
...

这个想法是pointer_to_matrix指向RAM中的matrix，它只被n个脚本加载一次(而不是n次)。它们分别从 bash 脚本调用(或者如果可能的话形成 python main):

$ python Load_and_share.py
$ python script_1.py -args string &
$ python script_2.py -args string &
$ ...
$ python script_n.py -args string &

我也对通过硬盘的解决方案感兴趣，即矩阵可以存储在磁盘上，而共享对象根据需要对其进行访问。尽管如此，RAM中的对象(一种指针)可以看作是整个矩阵。

感谢您的帮助。

最佳答案

在 mmap module之间和 numpy.frombuffer ，这相当简单:

import mmap
import numpy as np

with open("matrix_file.mtx","rb") as matfile:
    mm = mmap.mmap(matfile.fileno(), 0, access=mmap.ACCESS_READ)
    # Optionally, on UNIX-like systems in Py3.3+, add:
    # os.posix_fadvise(matfile.fileno(), 0, len(mm), os.POSIX_FADV_WILLNEED)
    # to trigger background read in of the file to the system cache,
    # minimizing page faults when you use it

matrix = np.frombuffer(mm, np.uint8)

每个进程都会单独执行这项工作，并获得同一内存的只读 View 。您可以更改 dtype除 uint8 以外的其他内容如所须。切换至ACCESS_WRITE将允许修改共享数据，尽管它需要同步并且可能需要显式调用 mm.flush以真正确保数据反射(reflect)在其他流程中。

更紧密地遵循您的初始设计的更复杂的解决方案可能是使用 multiprocessing.SyncManager 创建一个可连接的共享数据“服务器”，允许向管理器注册单个公共(public)数据存储并根据需要返回给任意数量的用户；创建 Array (基于 ctypes 类型)在管理器上使用正确的类型，然后 register - 返回相同共享的函数 Array对所有调用者也有效(每个调用者将像以前一样通过 Array 转换返回的 numpy.frombuffer )。它涉及更多(让单个Python进程初始化 Array ，然后启动 Process es 会更容易，由于 fork 语义，它会自动共享它)，但它最接近您描述的概念。

关于python - 在多个 python 脚本之间共享变量(文件中的数据)，且未加载重复项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/34819892/

python - 在多个 python 脚本之间共享变量(文件中的数据)，且未加载重复项

上一篇：Python:正则表达式匹配 C 代码中的多行字符串

下一篇：Python 将请求 header 传递给 pdfkit