r - 大数据中的插补

我需要估算缺失值。我的数据集有大约 800,000 行和 92 个变量。我在 r 的插补包中尝试了 kNNImpute，但看起来数据集太大了。 R 中的任何其他包/方法？我宁愿不使用均值来替换缺失值。谢谢

最佳答案

1) 你可以试试

library(sos)
findFn("impute")

这显示了 113 个包中的 400 个匹配项。这显示了 113 个包中的 400 个匹配项:您可以根据插补函数的要求缩小范围。

2) 你看到/试过了吗Hmisc ？

Description: The Hmisc library contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, importing datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of S objects to LaTeX code, and recoding variables.

3) 可能 mice

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

关于r - 大数据中的插补，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17214560/

r - 大数据中的插补

上一篇：c# - 从列表中选择某种类型的元素 (c#)

下一篇：clojurescript - 在 Clojurescript 中，如何获取 UUID 的字符串部分？