自 2016 年初以来,我一直致力于为 Go 实现 Pandas/R DataFrame 实现:https://github.com/kniren/gota。
最近,我一直专注于提高库的性能以尝试与 Pandas/Dplyr 相匹配。您可以在此处关注目前的进展:https://github.com/kniren/gota/issues/16
由于更频繁使用的操作之一是 DataFrame 子集化,我认为引入并发性以尝试提高系统性能可能是个好主意。
之前:
columns := make([]series.Series, df.ncols)
for i, column := range df.columns {
s := column.Subset(indexes)
columns[i] = s
}
之后:
columns := make([]series.Series, df.ncols)
var wg sync.WaitGroup
wg.Add(df.ncols)
for i := range df.columns {
go func(i int) {
columns[i] = df.columns[i].Subset(indexes)
wg.Done()
}(i)
}
wg.Wait()
据我所知,为 DataFrame 的每一列创建一个 goroutine 应该不会引入太多开销,所以我期望相对于串行版本至少实现 x2 加速(至少对于大型数据集) .然而,当使用不同大小的数据集和索引对这一变化进行基准测试时,结果非常令人失望 (NROWSxNCOLS_INDEXSIZE-CPUCORES):
benchmark old ns/op new ns/op delta
BenchmarkDataFrame_Subset/1000000x20_100 55230 109349 +97.99%
BenchmarkDataFrame_Subset/1000000x20_100-2 51457 67714 +31.59%
BenchmarkDataFrame_Subset/1000000x20_100-4 49845 70141 +40.72%
BenchmarkDataFrame_Subset/1000000x20_1000 518506 518085 -0.08%
BenchmarkDataFrame_Subset/1000000x20_1000-2 476661 311379 -34.67%
BenchmarkDataFrame_Subset/1000000x20_1000-4 505023 316583 -37.31%
BenchmarkDataFrame_Subset/1000000x20_10000 6621116 6314112 -4.64%
BenchmarkDataFrame_Subset/1000000x20_10000-2 7316062 4509601 -38.36%
BenchmarkDataFrame_Subset/1000000x20_10000-4 6483812 8394113 +29.46%
BenchmarkDataFrame_Subset/1000000x20_100000 105341711 106427967 +1.03%
BenchmarkDataFrame_Subset/1000000x20_100000-2 94567729 56778647 -39.96%
BenchmarkDataFrame_Subset/1000000x20_100000-4 91896690 60971444 -33.65%
BenchmarkDataFrame_Subset/1000000x20_1000000 1538680081 1632044752 +6.07%
BenchmarkDataFrame_Subset/1000000x20_1000000-2 1292113119 1100075806 -14.86%
BenchmarkDataFrame_Subset/1000000x20_1000000-4 1282367864 949615298 -25.95%
BenchmarkDataFrame_Subset/100000x20_100 50286 106850 +112.48%
BenchmarkDataFrame_Subset/100000x20_100-2 54537 70492 +29.26%
BenchmarkDataFrame_Subset/100000x20_100-4 58024 76617 +32.04%
BenchmarkDataFrame_Subset/100000x20_1000 541600 625967 +15.58%
BenchmarkDataFrame_Subset/100000x20_1000-2 493894 362894 -26.52%
BenchmarkDataFrame_Subset/100000x20_1000-4 535373 349211 -34.77%
BenchmarkDataFrame_Subset/100000x20_10000 6298063 7678499 +21.92%
BenchmarkDataFrame_Subset/100000x20_10000-2 5827185 4832560 -17.07%
BenchmarkDataFrame_Subset/100000x20_10000-4 8195048 3660077 -55.34%
BenchmarkDataFrame_Subset/100000x20_100000 105108807 82976477 -21.06%
BenchmarkDataFrame_Subset/100000x20_100000-2 92112736 58317114 -36.69%
BenchmarkDataFrame_Subset/100000x20_100000-4 92044966 63469935 -31.04%
BenchmarkDataFrame_Subset/1000x20_10 9741 53365 +447.84%
BenchmarkDataFrame_Subset/1000x20_10-2 9366 36457 +289.25%
BenchmarkDataFrame_Subset/1000x20_10-4 9463 46682 +393.31%
BenchmarkDataFrame_Subset/1000x20_100 50841 103523 +103.62%
BenchmarkDataFrame_Subset/1000x20_100-2 49972 62344 +24.76%
BenchmarkDataFrame_Subset/1000x20_100-4 72014 81808 +13.60%
BenchmarkDataFrame_Subset/1000x20_1000 457799 571292 +24.79%
BenchmarkDataFrame_Subset/1000x20_1000-2 460551 405116 -12.04%
BenchmarkDataFrame_Subset/1000x20_1000-4 462928 416522 -10.02%
BenchmarkDataFrame_Subset/1000x200_10 90125 688443 +663.88%
BenchmarkDataFrame_Subset/1000x200_10-2 85259 392705 +360.60%
BenchmarkDataFrame_Subset/1000x200_10-4 87412 387509 +343.31%
BenchmarkDataFrame_Subset/1000x200_100 486600 1082901 +122.54%
BenchmarkDataFrame_Subset/1000x200_100-2 471154 732304 +55.43%
BenchmarkDataFrame_Subset/1000x200_100-4 542846 659571 +21.50%
BenchmarkDataFrame_Subset/1000x200_1000 5926086 6686480 +12.83%
BenchmarkDataFrame_Subset/1000x200_1000-2 5364091 3986970 -25.67%
BenchmarkDataFrame_Subset/1000x200_1000-4 5904977 4504084 -23.72%
BenchmarkDataFrame_Subset/1000x2000_10 1187297 7800052 +556.96%
BenchmarkDataFrame_Subset/1000x2000_10-2 1217022 3930742 +222.98%
BenchmarkDataFrame_Subset/1000x2000_10-4 1301666 3617871 +177.94%
BenchmarkDataFrame_Subset/1000x2000_100 6942015 10790196 +55.43%
BenchmarkDataFrame_Subset/1000x2000_100-2 6588351 7592847 +15.25%
BenchmarkDataFrame_Subset/1000x2000_100-4 7067226 14391327 +103.63%
BenchmarkDataFrame_Subset/1000x2000_1000 62392457 69560711 +11.49%
BenchmarkDataFrame_Subset/1000x2000_1000-2 57793006 37416703 -35.26%
BenchmarkDataFrame_Subset/1000x2000_1000-4 59572261 58398203 -1.97%
benchmark old allocs new allocs delta
BenchmarkDataFrame_Subset/1000000x20_100 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_10000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_100000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000000x20_1000000-2 41 43 +4.88%
BenchmarkDataFrame_Subset/1000000x20_1000000-4 41 46 +12.20%
BenchmarkDataFrame_Subset/100000x20_100 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100-4 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_1000 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_1000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_10000 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_10000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100000 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/100000x20_100000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_10 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_10-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_10-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_100 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_100-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_100-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_1000 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-2 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x20_1000-4 41 42 +2.44%
BenchmarkDataFrame_Subset/1000x200_10 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_10-2 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_10-4 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_100 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_100-2 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_100-4 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_1000 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-2 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x200_1000-4 401 402 +0.25%
BenchmarkDataFrame_Subset/1000x2000_10 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-2 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_10-4 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_100 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-2 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_100-4 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000 4001 4002 +0.02%
BenchmarkDataFrame_Subset/1000x2000_1000-2 4001 4010 +0.22%
BenchmarkDataFrame_Subset/1000x2000_1000-4 4001 4003 +0.05%
benchmark old bytes new bytes delta
BenchmarkDataFrame_Subset/1000000x20_100 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-2 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000000x20_100-4 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000000x20_1000 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-2 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000000x20_1000-4 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000000x20_10000 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-2 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_10000-4 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000 29083520 29083536 +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-2 29083520 29083547 +0.00%
BenchmarkDataFrame_Subset/1000000x20_100000-4 29083542 29083563 +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000 290121600 290121616 +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-2 290121600 290121696 +0.00%
BenchmarkDataFrame_Subset/1000000x20_1000000-4 290121600 290121840 +0.00%
BenchmarkDataFrame_Subset/100000x20_100 32400 32416 +0.05%
BenchmarkDataFrame_Subset/100000x20_100-2 32400 32416 +0.05%
BenchmarkDataFrame_Subset/100000x20_100-4 32400 32416 +0.05%
BenchmarkDataFrame_Subset/100000x20_1000 298880 298896 +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-2 298880 298896 +0.01%
BenchmarkDataFrame_Subset/100000x20_1000-4 298880 298896 +0.01%
BenchmarkDataFrame_Subset/100000x20_10000 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-2 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/100000x20_10000-4 2971520 2971536 +0.00%
BenchmarkDataFrame_Subset/100000x20_100000 29083520 29083536 +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-2 29083520 29083536 +0.00%
BenchmarkDataFrame_Subset/100000x20_100000-4 29083542 29083553 +0.00%
BenchmarkDataFrame_Subset/1000x20_10 4880 4896 +0.33%
BenchmarkDataFrame_Subset/1000x20_10-2 4880 4896 +0.33%
BenchmarkDataFrame_Subset/1000x20_10-4 4880 4896 +0.33%
BenchmarkDataFrame_Subset/1000x20_100 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000x20_100-2 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000x20_100-4 32400 32416 +0.05%
BenchmarkDataFrame_Subset/1000x20_1000 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-2 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000x20_1000-4 298880 298896 +0.01%
BenchmarkDataFrame_Subset/1000x200_10 49568 49584 +0.03%
BenchmarkDataFrame_Subset/1000x200_10-2 49568 49584 +0.03%
BenchmarkDataFrame_Subset/1000x200_10-4 49568 49585 +0.03%
BenchmarkDataFrame_Subset/1000x200_100 324768 324784 +0.00%
BenchmarkDataFrame_Subset/1000x200_100-2 324768 324784 +0.00%
BenchmarkDataFrame_Subset/1000x200_100-4 324768 324784 +0.00%
BenchmarkDataFrame_Subset/1000x200_1000 2989568 2989584 +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-2 2989568 2989584 +0.00%
BenchmarkDataFrame_Subset/1000x200_1000-4 2989569 2989588 +0.00%
BenchmarkDataFrame_Subset/1000x2000_10 491072 491088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_10-2 491072 491133 +0.01%
BenchmarkDataFrame_Subset/1000x2000_10-4 491072 491088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_100 3243072 3243088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-2 3243074 3243102 +0.00%
BenchmarkDataFrame_Subset/1000x2000_100-4 3243076 3243100 +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000 29891072 29891088 +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-2 29891086 29891797 +0.00%
BenchmarkDataFrame_Subset/1000x2000_1000-4 29891115 29891167 +0.00%
在此基准测试上运行分析器 (cpu/mem) 似乎没有发现任何重要信息。并发版本似乎在 rumtime.match_semaphore_signal
上花费了一些时间,但我想这是在等待 goroutines 完成时预料到的。
我尝试将启动的 goroutine 数量限制为 runtime.GOMAXPROCS(0)
报告的最大内核数量,但结果更糟。我是不是在这里做错了什么,或者 goroutines 的开销太大以至于对性能有如此显着的影响?
最佳答案
Goroutines 很便宜,但不是免费的。
我没有阅读您的代码,但如果您为处理的每个 行生成 NCOLS_INDEXSIZE goroutine,那么这是一个非常糟糕的做法。
这可以在您的基准测试中看到,其中您有 2k 列而只有 1k 行 - 您获得了非常大的改进。但在所有其他情况下,当列数 << 行数时,goroutine 生成成为瓶颈。
相反,您应该生成一个 goroutines 池(接近您的 CPU 数量)并通过 channel 在它们之间分配工作——这是规范的方式。您可能想阅读 https://blog.golang.org/pipelines
关于performance - 子集 DataFrames 时的 Goroutines 开销和性能分析(Gota),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41209838/