r - 过滤 R data.frames 时更新因子水平

标签 r dataframe r-factor

我有一个类似于下面的数据框。我通过删除我不感兴趣的行来对其进行预处理。我的大多数列都是“因素”,当我过滤数据框时,其“级别”不会更新。

我可以看到我下面所做的并不理想。修改 data.frame 时如何更新因子水平?下面是出现问题的演示。

# generate data
set.seed(2013)
df <- data.frame(site = sample(c("A","B","C"), 50, replace = TRUE),
                 currency = sample(c("USD", "EUR", "GBP", "CNY", "CHF"),50, replace=TRUE, prob=c(10,6,5,6,0.5)),
                 value = ceiling(rnorm(50)*10))

# check counts to see there is one entry where currency =  CHF
count(df, vars="currency")

>currency freq
>1      CHF    1
>2      CNY   13
>3      EUR   16
>4      GBP    6
>5      USD   14


# filter out all entires where site = A, i.e. take subset of df
df <- df[!(df$site=="A"),]

# check counts again to see how this affected the currency frequencies
count(df, vars="currency")

>currency freq
>1      CNY   10
>2      EUR    8
>3      GBP    4
>4      USD   10

# But, the filtered data.frame's levels have not been updated:
levels(df$currency)

>[1] "CHF" "CNY" "EUR" "GBP" "USD"

levels(df$site)

>[1] "A" "B" "C"

期望的输出:

# levels(df$currency) = "CNY" "EUR" "GBP" "USD
# levels(df$site) = "B" "C"

最佳答案

使用droplevels:

> df <- droplevels(df)
> levels(df$currency)
[1] "CNY" "EUR" "GBP" "USD"
> levels(df$site)
[1] "B" "C"

关于r - 过滤 R data.frames 时更新因子水平,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20499613/

相关文章:

r - 错误 : dependency ‘proxy’ is not available for package ‘quanteda’

python - 将缺失日期转发到 Python Pandas Dataframe

python - 在 pandas DataFrame 中搜索列

r - 如何连接因子而不将它们转换为整数级别?

r - 如何按 R 列表中的唯一值进行分组

r - 将几个图从 R 导出到 ppt

r - 如何在 R 包的 html 帮助页面中显示新闻?

r - R 中的李克特分组错误

python - 从 RapidMiner Studio 中的执行 Python 处理器返回 pandas DataFrame 时出现 ValueError

r - 组合 R 中数据帧的因子水平