我第一次尝试从 Matlab 迁移到 Julia,发现我的代码改进了 ~3 倍,但仍然认为还有更多,我没有在函数中使用任何全局变量,并且已经预先分配了所有使用的数组(我认为?)。如果对如何进一步加快速度有任何想法,将不胜感激,即使在我认为目前的改进下,我也会完全转换!
function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = zeros(Int32, 1, omega) # to store the words
tests = rand(1,omega) # will compare mu to these
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
innovates = mu .> tests; # when we'll make a new word
for i = 2:tau # simulate the process
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if innovates[i] == 1 # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = sum(words .== (omega + 1)); # count how many times our word occurred
return result
end
当我使用这些值运行它时,在我的 PC 上需要 ~.26 秒
using Statistics
@time begin
nsim = 10^3;
omega = 100;
seed = [0:1:(omega-1);];
mu = 0.01;
results = zeros(Float64, 1, length(seed));
pops = zeros(Int64, 1, nsim);
for tau in seed
for jj = 1:nsim
pops[jj] = word_sim(tau, omega, mu);
end
results[tau+1] = mean(pops);
end
end
或者也许我最好用 C++ 编写代码? Julia 是我的第一 react ,因为我听说过对其语法的好评如潮,老实说,这太棒了!
非常感谢任何评论。
最佳答案
3 倍加速是一个不错的开始,但事实证明,您还可以采取更多措施来显着提高性能!
作为起点,使用上面在 Julia 1.6.1 中发布的示例,我得到
0.301665 seconds (798.10 k allocations: 164.778 MiB, 12.70% gc time)
这是大量的分配,以及相当多的垃圾收集器(“gc”)时间,所以看起来我们在这里产生了相当多的垃圾。一些罪魁祸首是像tests = rand(1,omega) # will compare mu to these
或者innovates = mu .> tests; # when we'll make a new word
在像 Matlab 或 Python 这样的语言中,一次预计算这些东西整体向量可能对性能有好处,但在 Julia 中这通常不是必需的,甚至可能会造成伤害,因为这些行中的每一行都会导致一个全新的要分配的数组。如果我们删除这些并即时生成我们的测试,我们可以避免这些分配。在这里分配的另一行是result = sum(words .== (omega + 1))
在求和之前首先构建一个全新的数组。您可以通过将其写为 for
来避免这种情况。循环(尽管这在 Matlab 中可能感觉不对,但在 Julia 中却相当快)。或者,要将其保留为单行,请使用 count
或 sum
使用将比较作为第一个参数的函数result = count(x->(x == omega+1), words)
(在本例中,仅使用匿名函数 x->(x == omega+1)
)。到目前为止,添加这些更改
function word_sim(tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
words = zeros(Int32, 1, omega) # to store the words
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
for i = 2:tau # simulate the process
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
for i = (tau+2):omega
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = count(x->(x == omega+1), words) # count how many times our word occurred
return result
end
使用相同的时间代码,这现在让我们回到 0.177766 seconds (298.10 k allocations: 51.863 MiB, 13.01% gc time)
所以大约一半的时间和一半的分配。还有更多!首先,让我们移动
words
的分配word_sim 函数之外的数组,而是创建该函数的就地版本。我们还可以加快处理速度,在紧的 for
中添加 @inbounds循环。function word_sim!(words::AbstractArray, tau::Int, omega::Int, mu::Float64)
# inserts a word in position (tau+1), at each point creates a new word with prob mu
# otherwise randomly chooses a previously used. Runs the program until time omega
fill!(words, 0) # Probably not necessary actually, but I haven't spent enough time looking at the code to be sure
words[1] = 1; # initialize the words
next_word = 2 # will be the next word used
words[tau+1] = omega + 1; # max possible word so insert that at time tau
@inbounds for i = 2:tau # simulate the process
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
# force the word we're interested in
@inbounds for i = (tau+2):omega
if mu > rand() # innovate
words[i] = next_word
next_word = next_word + 1
else # copy
words[i] = words[rand(1:(i-1))]
end
end
result = count(x->(x == omega+1), words) # count how many times our word occurred
return result
end
修改其输入参数之一的就地函数通常用 !
表示按照 Julia 的约定,在它们名称的末尾,因此是新的函数名称。由于我们必须稍微修改时序代码以预先分配
words
现在,让我们也借此机会将该计时代码放入一个函数中,以避免计时中出现任何全局变量。function run_word_sim()
nsim = 10^3
omega = 100
seed = [0:1:(omega-1);]
mu = 0.01
results = zeros(Float64, 1, length(seed))
pops = zeros(Int64, 1, nsim)
words = zeros(Int32, 1, omega) # to store the words
for tau in seed
for jj = 1:nsim
pops[jj] = word_sim!(words, tau, omega, mu)
end
results[tau+1] = mean(pops)
end
return results
end
然后我们可以使用 BenchmarkTools 包及其 @btime
获得最准确的计时结果(以及一些有用的图表和统计数据)。或 @benchmark
宏julia> using BenchmarkTools
julia> @btime run_word_sim()
124.178 ms (4 allocations: 10.17 KiB)
或者因此,几乎又提高了 3 倍,并将分配和内存使用量(减少了四到五个数量级)减少到仅在时序代码中使用的四个数组(
seed
、 results
、 pops
和 words
) .为了获得绝对的最大性能,您可以使用 LoopVectorization.jl 走得更远。和它的
@turbo
宏,尽管它可能需要更改算法,因为这些循环取决于以前的状态,因此似乎与循环重新排序不兼容。你可以转 count
进入 for 循环和 @turbo
不过,这是一个稍微额外的加速。还有其他可能更快的随机数生成选项,例如 VectorizedRNG.jl正如评论中链接的话语线索中所讨论的那样。在每次调用
word_sim
时分配一个新的随机数向量可能不是最优的,当您可以一次生成大量随机数时,RNG 通常更快,因此将预分配的随机数缓冲区传递给 word_sim!
并用 rand!
就地填充由 Random
提供stdlib 或 VectorizedRNG
可能会产生显着的额外加速。在 https://github.com/brenhinkeller/JuliaAdviceForMatlabProgrammers 中更广泛地讨论了此答案中使用的一些技巧和经验法则。 ,以及其他一些通用的 Matlab -> Julia 技巧。
关于performance - Julia 性能改进建议,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61349348/