用因子替换范围内的数字

标签 r data-processing r-factor

这个问题在这里已经有了答案:

Convert continuous numeric values to discrete categories defined by intervals

(2 个回答)

去年关闭。

给定一个数据框列，它是一系列整数(年龄)，我想将整数范围转换为序数变量。

我当前的代码不起作用，我该怎么做？

df <- read.table("http://dl.dropbox.com/u/822467/df.csv", header = TRUE, sep = ",")

df[(df >= 0)  & (df <= 14)] <- "Age1"
df[(df >= 15) & (df <= 44)] <- "Age2"
df[(df >= 45) & (df <= 64)] <- "Age3"
df[(df > 64)] <- "Age4"

table(df)

最佳答案

使用 cut一步完成:

dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf))
str(dfc)
 Factor w/ 4 levels "(0,15]","(15,45]",..: 3 4 3 2 2 4 2 2 4 4 ...

一旦您满意，breaks正确指定，然后您也可以使用 labels重新标记级别的参数:

dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf), labels=paste("Age", 1:4, sep=""))
str(dfc)
 Factor w/ 4 levels "Age1","Age2",..: 3 4 3 2 2 4 2 2 4 4 ...

关于用因子替换范围内的数字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10222525/

上一篇：cmd - 通过 cmd/bat 从文件中获取产品版本

下一篇：tcl - 如何全局声明一个仅在proc中使用的变量

相关文章：

pandas - 如何将数据框中的字典拆分为多列

python - 如何将 MLP 的数据调整为 LSTM(预期 ndim=3，发现 ndim=2 错误)

r - 使用 ifelse 修改因子变量的水平

在 R 中重新编码 : Transform 2 numeric rows of data into one factor row

r - 根据一个或多个其他列中的值对因子进行排序

r - ggplot 条形图，条形方向翻转

r - 创建新的重复索引，其中每个数字的范围尽可能高效地依赖于另一列的索引

r - 如何将键值和图例放在热图的底部

r - 如何在R中生成具有指定对数正态分布的随机数？

machine-learning - 何时应用数据白化