python - 如何使带有两个for循环的python代码运行得更快(有没有一种python方法可以进行Mathematica的并行化)？

我对 python 或任何此类编程语言完全陌生。我对 Mathematica 有一些经验。我有一个数学问题，虽然 Mathematica 用她自己的“并行化”方法解决了这个问题，但在使用所有核心后，系统却变得非常疲惫!在运行过程中我几乎无法使用机器。因此，我一直在寻找一些编码替代方案，并发现 python 很容易学习和实现。言归正传，让我告诉你数学问题和我的 python 代码的问题。由于完整代码太长，我来概述一下。

<强>1。数值求解形式为 y''(t) + f(t)y(t)=0 的微分方程，以获得某个范围内的 y(t)，例如 C <= t <= D

2.接下来，将数值结果插入某个所需范围以获得函数:w(t)，例如 A <= t <= B

<强>3。使用 w(t) 来求解另一个形式为 z''(t) + [ a + b W(t)] z(t) =0 的微分方程，对于 a 和 b 的某个范围，我使用的是循环。

<强>4。 Deine F = 1 + sol1[157]，创建一个类似 {a, b, F} 的列表。因此，让我给出一个原型(prototype)循环，因为这会占用大部分计算时间。

for q in np.linspace(0.0, 4.0, 100):
    for a in np.linspace(-2.0, 7.0, 100):
        print('Solving for q = {}, a = {}'.format(q,a))
        sol1 = odeint(fun, [1, 0], t, args=( a, q))[..., 0]
        print(t[157])
        F = 1 + sol1[157]                    
        f1.write("{}  {} {} \n".format(q, a, F))            
    f1.close()

现在，真正的循环大约需要 4 小时 30 分钟才能完成(使用 w(t) 的某些内置函数形式，大约需要 2 分钟)。当我在代码中定义 fun 之前应用(没有正确理解它的作用和方式!)numba/autojit时，运行时间显着改善，大约需要 2小时 30 分钟。此外，将两个循环编写为 itertools/product 进一步将运行时间缩短了大约 2 分钟!然而，当我让 Mathematica 使用所有 4 个核心时，她在 30 分钟内完成了任务。

那么，有没有办法改善Python的运行时呢？

最佳答案

要加速 python，您有三个选择:

处理程序中的特定瓶颈(如 @LutzL 评论中的建议)
尝试通过使用 cython 将代码编译为 C 来加速代码(或包含使用 weave 或类似技术的 C 代码)。由于您的情况中耗时的计算不是在 python 代码中，而是在 scipy 模块中(至少我相信它们是)，所以这在这里对您没有多大帮助。
实现multiprocessing正如您在原来的问题中所建议的那样。如果您有 X 个核心，这将使您的代码速度提高 X(略小于)倍。不幸的是，这在 python 中相当复杂。

实现多处理 - 使用原始问题中的原型(prototype)循环的示例

我假设您在原型(prototype)代码的嵌套循环内进行的计算实际上是相互独立的。然而，由于您的原型(prototype)代码不完整，我不确定情况是否如此。否则的话，当然是行不通的。我将给出一个示例，不是使用 fun 函数的微分方程问题，而是使用相同签名(输入和输出变量)的原型(prototype)。

import numpy as np
import scipy.integrate
import multiprocessing as mp

def fun(y, t, b, c):
    # replace this function with whatever function you want to work with
    #    (this one is the example function from the scipy docs for odeint)
    theta, omega = y
    dydt = [omega, -b*omega - c*np.sin(theta)]
    return dydt

#definitions of work thread and write thread functions

def run_thread(input_queue, output_queue):
    # run threads will pull tasks from the input_queue, push results into output_queue
    while True:
        try:
            queueitem = input_queue.get(block = False)
            if len(queueitem) == 3:
                a, q, t = queueitem
                sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
                F = 1 + sol1[157]
                output_queue.put((q, a, F))
        except Exception as e:
            print(str(e))
            print("Queue exhausted, terminating")
            break

def write_thread(queue):    
    # write thread will pull results from output_queue, write them to outputfile.txt
    f1 = open("outputfile.txt", "w")
    while True:
        try:
            queueitem = queue.get(block = False)
            if queueitem[0] == "TERMINATE":
                f1.close()
                break
            else:
                q, a, F = queueitem                
                print("{}  {} {} \n".format(q, a, F))            
                f1.write("{}  {} {} \n".format(q, a, F))            
        except:
            # necessary since it will throw an error whenever output_queue is empty
            pass

# define time point sequence            
t = np.linspace(0, 10, 201)

# prepare input and output Queues
mpM = mp.Manager()
input_queue = mpM.Queue()
output_queue = mpM.Queue()

# prepare tasks, collect them in input_queue
for q in np.linspace(0.0, 4.0, 100):
    for a in np.linspace(-2.0, 7.0, 100):
        # Your computations as commented here will now happen in run_threads as defined above and created below
        # print('Solving for q = {}, a = {}'.format(q,a))
        # sol1 = scipy.integrate.odeint(fun, [1, 0], t, args=( a, q))[..., 0]
        # print(t[157])
        # F = 1 + sol1[157]    
        input_tupel = (a, q, t)
        input_queue.put(input_tupel)

# create threads
thread_number = mp.cpu_count()
procs_list = [mp.Process(target = run_thread , args = (input_queue, output_queue)) for i in range(thread_number)]         
write_proc = mp.Process(target = write_thread, args = (output_queue,))

# start threads
for proc in procs_list:
    proc.start()
write_proc.start()

# wait for run_threads to finish
for proc in procs_list:
    proc.join()

# terminate write_thread
output_queue.put(("TERMINATE",))
write_proc.join()

说明

我们在开始计算之前定义各个问题(或者更确切地说是它们的参数)；我们将它们收集在输入队列中。
我们定义一个在线程中运行的函数(run_thread)。该函数计算各个问题，直到输入队列中没有剩余问题为止；它将结果推送到输出队列中。
我们有多少个 CPU 就启动多少个这样的线程。
我们启动一个额外的线程 (write_thread)，用于从输出队列收集结果并将其写入文件。

注意事项

对于较小的问题，您可以在没有队列的情况下运行多重处理。但是，如果单个计算的数量很大，您将超出内核允许的最大线程数，之后内核会终止您的程序。
不同操作系统之间的多处理工作原理存在差异。上面的示例适用于 Linux(也许也适用于其他类 Unix 系统，例如 Mac 和 BSD)，not on Windows 。原因是Windows没有fork()系统调用。 (我无法访问 Windows，因此无法尝试在 Windows 上实现它。)

关于python - 如何使带有两个for循环的python代码运行得更快(有没有一种python方法可以进行Mathematica的并行化)？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43945124/

python - 如何使带有两个for循环的python代码运行得更快(有没有一种python方法可以进行Mathematica的并行化)？

上一篇：C# 资源文件 - 如何从 XAML 访问内部资源？

下一篇：spring - 如何使用 Springboot 1.2.8 制作 CrossOrigin