dataframe - 在 Julia 中，通过对列::Float64 进行分箱来对 DataFrame 进行分组

假设我有一个带有 Float64 列的 DataFrame，我想通过对该列进行分箱来对数据帧进行分组。我听说 cut 函数可能会有所帮助，但它不是在数据帧上定义的。一些工作已经完成( https://gist.github.com/tautologico/3925372 )，但我宁愿使用库函数而不是从互联网复制粘贴代码。指针？

编辑通过 UNIX 时间戳找到按月执行此操作的方法的奖励业力:)

最佳答案

您可以像这样基于Float64列对数据帧进行分箱。这里我的分箱是从 0.0 到 1.0 以 0.1 为增量，根据一列 0.0 到 1.0 之间的 100 个随机数对数据帧进行分箱。

using DataFrames #load DataFrames
df = DataFrame(index = rand(Float64,100)) #Make a DataFrame with some random Float64 numbers
df_array = map(x->df[(df[:index] .>= x[1]) .& (df[:index] .<x[2]),:],zip(0.0:0.1:0.9,0.1:0.1:1.0)) #Map an anonymous function that gets every row between two numbers specified by a tuple called x, and map that anonymous function to an array of tuples generated using the zip function.

这将生成一个包含 10 个数据帧的数组，每个数据帧都有不同的 0.1 大小的 bin。

至于 UNIX 时间戳问题，我不太熟悉这方面的事情，但是在尝试了一下之后，也许这样的事情可以工作:

using Dates

df = DataFrame(unixtime = rand(1E9:1:1.1E9,100)) #Make a dataframe with floats containing pretend unix time stamps
df[:date] = Dates.unix2datetime.(df[:unixtime]) #convert those timestamps to DateTime types
df[:year_month] = map(date->string(Dates.Year.(date))*" "*string(Dates.Month.(date)),df[:date]) #Make a string for every month in your time range
df_array = map(ym->df[df[:year_month] .== ym,:],unique(df[:year_month])) #Bin based on each unique year_month string

关于dataframe - 在 Julia 中，通过对列::Float64 进行分箱来对 DataFrame 进行分组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47391489/

dataframe - 在 Julia 中，通过对列::Float64 进行分箱来对 DataFrame 进行分组

上一篇：angular - 当 url 不可用时如何设置后备 (assets/i18n/en.json) (ngx-translate/http-loader)

下一篇：openlayers-3 - OpenLayers - 修改时锁定框或矩形几何体的旋转