performance - 子集 DataFrames 时的 Goroutines 开销和性能分析(Gota)

标签 performance go goroutine

自 2016 年初以来,我一直致力于为 Go 实现 Pandas/R DataFrame 实现:https://github.com/kniren/gota

最近,我一直专注于提高库的性能以尝试与 Pandas/Dplyr 相匹配。您可以在此处关注目前的进展:https://github.com/kniren/gota/issues/16

由于更频繁使用的操作之一是 DataFrame 子集化,我认为引入并发性以尝试提高系统性能可能是个好主意。

之前:

columns := make([]series.Series, df.ncols)
for i, column := range df.columns {
    s := column.Subset(indexes)
    columns[i] = s
}

之后:

columns := make([]series.Series, df.ncols)
var wg sync.WaitGroup
wg.Add(df.ncols)
for i := range df.columns {
    go func(i int) {
        columns[i] = df.columns[i].Subset(indexes)
        wg.Done()
    }(i)
}
wg.Wait()

据我所知,为 DataFrame 的每一列创建一个 goroutine 应该不会引入太多开销,所以我期望相对于串行版本至少实现 x2 加速(至少对于大型数据集) .然而,当使用不同大小的数据集和索引对这一变化进行基准测试时,结果非常令人失望 (NROWSxNCOLS_INDEXSIZE-CPUCORES):

benchmark                                          old ns/op      new ns/op      delta
BenchmarkDataFrame_Subset/1000000x20_100           55230          109349         +97.99%
BenchmarkDataFrame_Subset/1000000x20_100-2         51457          67714          +31.59%
BenchmarkDataFrame_Subset/1000000x20_100-4         49845          70141          +40.72%
BenchmarkDataFrame_Subset/1000000x20_1000          518506         518085         -0.08%
BenchmarkDataFrame_Subset/1000000x20_1000-2        476661         311379         -34.67%
BenchmarkDataFrame_Subset/1000000x20_1000-4        505023         316583         -37.31%
BenchmarkDataFrame_Subset/1000000x20_10000         6621116        6314112        -4.64%
BenchmarkDataFrame_Subset/1000000x20_10000-2       7316062        4509601        -38.36%
BenchmarkDataFrame_Subset/1000000x20_10000-4       6483812        8394113        +29.46%
BenchmarkDataFrame_Subset/1000000x20_100000        105341711      106427967      +1.03%
BenchmarkDataFrame_Subset/1000000x20_100000-2      94567729       56778647       -39.96%
BenchmarkDataFrame_Subset/1000000x20_100000-4      91896690       60971444       -33.65%
BenchmarkDataFrame_Subset/1000000x20_1000000       1538680081     1632044752     +6.07%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     1292113119     1100075806     -14.86%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     1282367864     949615298      -25.95%
BenchmarkDataFrame_Subset/100000x20_100            50286          106850         +112.48%
BenchmarkDataFrame_Subset/100000x20_100-2          54537          70492          +29.26%
BenchmarkDataFrame_Subset/100000x20_100-4          58024          76617          +32.04%
BenchmarkDataFrame_Subset/100000x20_1000           541600         625967         +15.58%
BenchmarkDataFrame_Subset/100000x20_1000-2         493894         362894         -26.52%
BenchmarkDataFrame_Subset/100000x20_1000-4         535373         349211         -34.77%
BenchmarkDataFrame_Subset/100000x20_10000          6298063        7678499        +21.92%
BenchmarkDataFrame_Subset/100000x20_10000-2        5827185        4832560        -17.07%
BenchmarkDataFrame_Subset/100000x20_10000-4        8195048        3660077        -55.34%
BenchmarkDataFrame_Subset/100000x20_100000         105108807      82976477       -21.06%
BenchmarkDataFrame_Subset/100000x20_100000-2       92112736       58317114       -36.69%
BenchmarkDataFrame_Subset/100000x20_100000-4       92044966       63469935       -31.04%
BenchmarkDataFrame_Subset/1000x20_10               9741           53365          +447.84%
BenchmarkDataFrame_Subset/1000x20_10-2             9366           36457          +289.25%
BenchmarkDataFrame_Subset/1000x20_10-4             9463           46682          +393.31%
BenchmarkDataFrame_Subset/1000x20_100              50841          103523         +103.62%
BenchmarkDataFrame_Subset/1000x20_100-2            49972          62344          +24.76%
BenchmarkDataFrame_Subset/1000x20_100-4            72014          81808          +13.60%
BenchmarkDataFrame_Subset/1000x20_1000             457799         571292         +24.79%
BenchmarkDataFrame_Subset/1000x20_1000-2           460551         405116         -12.04%
BenchmarkDataFrame_Subset/1000x20_1000-4           462928         416522         -10.02%
BenchmarkDataFrame_Subset/1000x200_10              90125          688443         +663.88%
BenchmarkDataFrame_Subset/1000x200_10-2            85259          392705         +360.60%
BenchmarkDataFrame_Subset/1000x200_10-4            87412          387509         +343.31%
BenchmarkDataFrame_Subset/1000x200_100             486600         1082901        +122.54%
BenchmarkDataFrame_Subset/1000x200_100-2           471154         732304         +55.43%
BenchmarkDataFrame_Subset/1000x200_100-4           542846         659571         +21.50%
BenchmarkDataFrame_Subset/1000x200_1000            5926086        6686480        +12.83%
BenchmarkDataFrame_Subset/1000x200_1000-2          5364091        3986970        -25.67%
BenchmarkDataFrame_Subset/1000x200_1000-4          5904977        4504084        -23.72%
BenchmarkDataFrame_Subset/1000x2000_10             1187297        7800052        +556.96%
BenchmarkDataFrame_Subset/1000x2000_10-2           1217022        3930742        +222.98%
BenchmarkDataFrame_Subset/1000x2000_10-4           1301666        3617871        +177.94%
BenchmarkDataFrame_Subset/1000x2000_100            6942015        10790196       +55.43%
BenchmarkDataFrame_Subset/1000x2000_100-2          6588351        7592847        +15.25%
BenchmarkDataFrame_Subset/1000x2000_100-4          7067226        14391327       +103.63%
BenchmarkDataFrame_Subset/1000x2000_1000           62392457       69560711       +11.49%
BenchmarkDataFrame_Subset/1000x2000_1000-2         57793006       37416703       -35.26%
BenchmarkDataFrame_Subset/1000x2000_1000-4         59572261       58398203       -1.97%

benchmark                                          old allocs     new allocs     delta
BenchmarkDataFrame_Subset/1000000x20_100           41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-2         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-4         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000          41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-2        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-4        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000         41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-2       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-4       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000        41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-2      41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-4      41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000       41             42             +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     41             43             +4.88%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     41             46             +12.20%
BenchmarkDataFrame_Subset/100000x20_100            41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100-2          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100-4          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000           41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-2         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-4         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000          41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-2        41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-4        41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000         41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-2       41             42             +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-4       41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10               41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10-2             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_10-4             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100              41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100-2            41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_100-4            41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000             41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-2           41             42             +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-4           41             42             +2.44%
BenchmarkDataFrame_Subset/1000x200_10              401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_10-2            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_10-4            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100             401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100-2           401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_100-4           401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000            401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-2          401            402            +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-4          401            402            +0.25%
BenchmarkDataFrame_Subset/1000x2000_10             4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-2           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-4           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100            4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-2          4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-4          4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000           4001           4002           +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000-2         4001           4010           +0.22%
BenchmarkDataFrame_Subset/1000x2000_1000-4         4001           4003           +0.05%

benchmark                                          old bytes     new bytes     delta
BenchmarkDataFrame_Subset/1000000x20_100           32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-2         32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-4         32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000000x20_1000          298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-2        298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-4        298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000000x20_10000         2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-2       2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-4       2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000        29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-2      29083520      29083547      +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-4      29083542      29083563      +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000       290121600     290121616     +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-2     290121600     290121696     +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-4     290121600     290121840     +0.00%
BenchmarkDataFrame_Subset/100000x20_100            32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_100-2          32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_100-4          32400         32416         +0.05%
BenchmarkDataFrame_Subset/100000x20_1000           298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-2         298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-4         298880        298896        +0.01%
BenchmarkDataFrame_Subset/100000x20_10000          2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-2        2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-4        2971520       2971536       +0.00%
BenchmarkDataFrame_Subset/100000x20_100000         29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-2       29083520      29083536      +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-4       29083542      29083553      +0.00%
BenchmarkDataFrame_Subset/1000x20_10               4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_10-2             4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_10-4             4880          4896          +0.33%
BenchmarkDataFrame_Subset/1000x20_100              32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_100-2            32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_100-4            32400         32416         +0.05%
BenchmarkDataFrame_Subset/1000x20_1000             298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-2           298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-4           298880        298896        +0.01%
BenchmarkDataFrame_Subset/1000x200_10              49568         49584         +0.03%
BenchmarkDataFrame_Subset/1000x200_10-2            49568         49584         +0.03%
BenchmarkDataFrame_Subset/1000x200_10-4            49568         49585         +0.03%
BenchmarkDataFrame_Subset/1000x200_100             324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_100-2           324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_100-4           324768        324784        +0.00%
BenchmarkDataFrame_Subset/1000x200_1000            2989568       2989584       +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-2          2989568       2989584       +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-4          2989569       2989588       +0.00%
BenchmarkDataFrame_Subset/1000x2000_10             491072        491088        +0.00%
BenchmarkDataFrame_Subset/1000x2000_10-2           491072        491133        +0.01%
BenchmarkDataFrame_Subset/1000x2000_10-4           491072        491088        +0.00%
BenchmarkDataFrame_Subset/1000x2000_100            3243072       3243088       +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-2          3243074       3243102       +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-4          3243076       3243100       +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000           29891072      29891088      +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-2         29891086      29891797      +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-4         29891115      29891167      +0.00%

在此基准测试上运行分析器 (cpu/mem) 似乎没有发现任何重要信息。并发版本似乎在 rumtime.match_semaphore_signal 上花费了一些时间,但我想这是在等待 goroutines 完成时预料到的。

我尝试将启动的 goroutine 数量限制为 runtime.GOMAXPROCS(0) 报告的最大内核数量,但结果更糟。我是不是在这里做错了什么,或者 goroutines 的开销太大以至于对性能有如此显着的影响?

最佳答案

Goroutines 很便宜,但不是免费的。

我没有阅读您的代码,但如果您为处理的每个 行生成 NCOLS_INDEXSIZE goroutine,那么这是一个非常糟糕的做法。

这可以在您的基准测试中看到,其中您有 2k 列而只有 1k 行 - 您获得了非常大的改进。但在所有其他情况下,当列数 << 行数时,goroutine 生成成为瓶颈。

相反,您应该生成一个 goroutines 池(接近您的 CPU 数量)并通过 channel 在它们之间分配工作——这是规范的方式。您可能想阅读 https://blog.golang.org/pipelines

关于performance - 子集 DataFrames 时的 Goroutines 开销和性能分析(Gota),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41209838/

相关文章:

python - 如何优化此 Python 代码以使其运行得更快?

parallel-processing - goroutines 在多核处理器上的表现如何

go - nginx proxy_pass on/giving 404 与 go 应用程序

go - goroutines中的请求时间

go - 如何从 channel 接收直到它在 GO 中具有值

sql - SQL 函数与代码函数的性能

java - Grails 与 REST 的 Spring 性能

performance - 衡量 ec2 实例之间 tcp 性能的正确方法

Go:http.Server连接池

go - golang连接池优化