dataframe - Julia 中的虚拟变量

标签 dataframe julia glm

在 R 中有很好的功能，可以为分类变量的每个级别运行带有虚拟变量的回归。例如Automatically expanding an R factor into a collection of 1/0 indicator variables for every factor level

在 Julia 中是否有等效的方法来执行此操作。

x = randn(1000)
group = repmat(1:25 , 40)
groupMeans = randn(25)
y = 3*x + groupMeans[group]

data = DataFrame(x=x, y=y, g=group)
for i in levels(group)
    data[parse("I$i")] = data[:g] .== i
end
lm(y~x+I1+I2+I3+I4+I5+I6+I7+I8+I9+I10+
    I11+I12+I13+I14+I15+I16+I17+I18+I19+I20+
    I21+I22+I23+I24, data)

最佳答案

如果您使用的是 DataFrames 包，则在您之后 pool数据，包将处理其余的:

Pooling columns is important for working with the GLM package When fitting regression models, PooledDataArray columns in the input are translated into 0/1 indicator columns in the ModelMatrix - with one column for each of the levels of the PooledDataArray.

您可以查看有关合并数据的其余文档 here

关于dataframe - Julia 中的虚拟变量，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29158626/

上一篇：r - 在 R 中的同一图上绘制不连续的 xts 时间序列？

下一篇：eclipse - 如何在 Eclipse 中打开 emacs 键？

后面跟着 readline() 时绘图命令失败

使用效果编码重新调整因子和 glm

r - mutate_at 在 R 中使用 lambda 函数？

python - 如何在 Python 中使用 Pandas 从特定列中查找重复行元素的最大绝对值并显示行和列索引

file - 我们如何使用julia一次读取一个.txt文件的每个字符？

arrays - 如何将长格式(可能稀疏)的 DataFrame 转换为多维 Array 或 NamedArray

R:残差建模

r - 如何从 R 中的 GLM 调用中检索原始变量名称的列表？

r - R中数据框中的平均列