r - 使用 R 从数据框中子集表列

下面数据框中的一列 (new) 是一个表格。

#dput(head(df1))
structure(list(a = c(1, 2, 3, 4, 5, 7), b = c(2, 3, 3, 5, 5, 
7), c = c(1, 3, 2, 4, 5, 7), new = list(structure(2:1, .Dim = 2L, .Dimnames = structure(list(
    c("1", "2")), .Names = ""), class = "table"), structure(1:2, .Dim = 2L, .Dimnames = structure(list(
    c("2", "3")), .Names = ""), class = "table"), structure(1:2, .Dim = 2L, .Dimnames = structure(list(
    c("2", "3")), .Names = ""), class = "table"), structure(2:1, .Dim = 2L, .Dimnames = structure(list(
    c("4", "5")), .Names = ""), class = "table"), structure(c(`5` = 3L), .Dim = 1L, .Dimnames = structure(list(
    "5"), .Names = ""), class = "table"), structure(c(`7` = 3L), .Dim = 1L, .Dimnames = structure(list(
    "7"), .Names = ""), class = "table"))), row.names = c(NA, 
6L), class = "data.frame")

新 列是apply(df1, 1, table) 的结果。使用 df1[4, "new"][[1]] 的 new 列子集示例产生以下输出。

df1[4, "new"][[1]]

#4 5 --> Vals
#2 1 --> Freq

我想制定一个条件，例如给我所有 Vals，其中 new 列中的 Freq 大于或等于一些条件并使用它来对 new 列进行子集化。

这是一个示例以及我到目前为止所做的。

df1[4, "new"][[1]][]>=2
#    4     5 
# TRUE FALSE 

# Subsetting using the above logical
as.integer(names(df1[4, "new"][[1]][df1[4, "new"][[1]][]>=2]))
#[1] 4

结果是我所期望的。然而，它很冗长，如果有更短的版本我会很高兴(目前这不是一个紧迫的问题，但我会很感激，也很乐意学习写出清晰简洁的行)。

我的紧迫问题是如何修改条件 as.integer(names(df1[4, "new"][[1]][df1[4, "new"][[1]] []>=2])) 并将其应用于整个列。例如，对于条件列 new == 3，5 和 7 是预期的输出。

我看过类似的帖子here和 here但没有帮助弄清楚如何将子集条件应用于表列。

谢谢。

最佳答案

调查对象(即列)的类会产生“列表”。

class(df1$new)
# [1] "list"

通常我们使用例如lapply() 函数将函数应用于列表的元素。为了获得向量或矩阵而不是列表，我们可以尝试 sapply。

那么，定义你的条件，

COND <- 2

并在 sapply 中使用您的函数:

sapply(df1$new, function(x) as.numeric(names(x[x >= COND])))
# [1] 1 3 3 4 5 7

关于r - 使用 R 从数据框中子集表列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57997346/

r - 使用 R 从数据框中子集表列

上一篇：docker - 使用域范围项目时，云构建中的图像名称无效

下一篇：mysql - 每次出现的最大值的 SQL 总和