julia - 创建新列时如何在组合中使用 nrow - julia

我有一个包含关键字和指标的数据框。
到目前为止，我拆分了关键字，将每个主题的指标相加，并计算该主题的出现次数。
我想第二次使用 nrow 来创建一个新的指标，例如“metric_per_Frequency”。
有没有办法将最后一部分内结合起来？
(我知道使用 DataFramesMeta @by 是可能的，但是在这个小样本上进行测试时，这似乎比使用组合要慢得多)。
另外，我不希望在新代码行中使用combine之后完成此操作，因为我似乎缺少combine()的正确用法。

这是一个最小样本:

topics = ["Money,Income,Lifestyle","Knowledge,Windows,Weather","Vaccation,Holiday,Money,Weather"]
metric = [2347820, 1275610, 1255383]

table = DataFrame(topics=topics, metric=metric)

agg_table = combine(
    groupby(
        flatten(transform(table, :topics => ByRow(x -> ismissing(x) ? "" : string.(split(x, ","))) => :topics), :topics),
        [:topics]),
        nrow => :frequency_topics,
        :metric => sum => :sum_metric
        #:metric => sum/nrow .=> :metric_per_frequency
    )

结果应该是这样的:

Row │ topics     sum_metric  freq_topic  metric_per_frequency
     │ String     Int64       Int64       Float64             
─────┼──────────────────────────────────────────────────────────
   1 │ Money         3603203           2              1.8016e6
   2 │ Income        2347820           1              2.34782e
   3 │ Lifestyle     2347820           1              2.34782e
   4 │ Knowledge     1275610           1              1.27561e
   5 │ Windows       1275610           1              1.27561e
   6 │ Weather       2530993           2              1.2655e6
   7 │ Vaccation     1255383           1              1.25538e
   8 │ Holiday       1255383           1              1.25538e

最佳答案

您想要的只是平均值(在使用统计后可用):

julia> agg_table = combine(
           groupby(
               flatten(transform(table, :topics => ByRow(x -> ismissing(x) ? "" : string.(split(x, ","))) => :topics), :topics),
               [:topics]),
               nrow => :frequency_topics,
               :metric => sum => :sum_metric,
               :metric => mean => :metric_per_frequency
           )
8×4 DataFrame
 Row │ topics     frequency_topics  sum_metric  metric_per_frequency
     │ String     Int64             Int64       Float64
─────┼───────────────────────────────────────────────────────────────
   1 │ Money                     2     3603203             1.8016e6
   2 │ Income                    1     2347820             2.34782e6
   3 │ Lifestyle                 1     2347820             2.34782e6
   4 │ Knowledge                 1     1275610             1.27561e6
   5 │ Windows                   1     1275610             1.27561e6
   6 │ Weather                   2     2530993             1.2655e6
   7 │ Vaccation                 1     1255383             1.25538e6
   8 │ Holiday                   1     1255383             1.25538e6

关于julia - 创建新列时如何在组合中使用 nrow - julia，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/76028489/

julia - 创建新列时如何在组合中使用 nrow - julia

上一篇：arrays - C 指向 int(*)[row] 类型的动态数组的指针，其中 row 在结构内部未分配

下一篇：html - NG-SELECT 在 Safari 中存在下拉列表项的可访问性问题