r - 通过分隔项将列表类型的列转换为长格式

我有一个包含感兴趣的两列的表，如下所示:

Status_id | hashtag
947306525726527488 | NEWYEARSEVEPARTY919
947306316959281153 | MakeItALifestyle
947306315952611330 | c("Ejuice", "vape", "vaping")
947306265520328704 | c("vapefam", "vapenation", "vapefamily")
947305941522771968 | nowplaying

str(juice) #df name
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   5 obs. of  2 variables:
$ status_id: chr  "947306525726527488" "947306316959281153" 
"947306315952611330" "947306265520328704" 
$ hashtags :List of 5
..$ : chr "NEWYEARSEVEPARTY919"
..$ : chr "MakeItALifestyle"
..$ : chr  "Ejuice" "vape" "vaping" "eliquid"
..$ : chr  "vapefam" "vapenation" "vapefamily"
..$ : chr "nowplaying"

数据

structure(list(status_id = c("947306525726527488", "947306316959281153", 
"947306315952611330", "947306265520328704", "947305941522771968"
), hashtags = list("NEWYEARSEVEPARTY919", "MakeItALifestyle", 
    c("Ejuice", "vape", "vaping", "eliquid", "ecigjuice", "ecig", 
    "vapejuice"), c("vapefam", "vapenation", "vapefamily", "vapelife", 
    "vapelyfe", "vapeon", "positivity"), "nowplaying")), .Names = c("status_id", 
"hashtags"), row.names = c(NA, -5L), class = c("tbl_df", "tbl", 
"data.frame"))

预期结果

我想要以下两个表(当然在实际的原始 df 中，我删除了更多列，因为它们与问题无关):

df1
Status_id
947306525726527488
947306316959281153
947306315952611330
947306265520328704
947305941522771968

和

df2
status_id | hashtag
947306525726527488 | NEWYEARSEVEPARTY919
947306316959281153 | MakeItALifestyle
947306315952611330 | Ejuice
947306315952611330 | vape
947306315952611330 | vaping
947306265520328704 | vapefam
947306265520328704 | vapenation
947306265520328704 | vapefamily
947305941522771968 | nowplaying

原始数据每个 status_id 有一行，所有主题标签 >1 作为 c(...) - 归类为类型:“列表”。 df2 将单独的主题标签分成单独的行。

虽然我之前从未遇到过列表类型的列，并且在谷歌上搜索它让我获得了将列表转换为列而不是“列表”类型的列的大量内容

最佳答案

这是一种可能的解决方案。我调用你的资料 mydf .您在 hashtags 中有列表.您可以为 hashtags 中的每一行创建一个向量使用 unlist()和 paste() .如果你愿意，你可以使用 toSting()而不是 paste() .一旦你在 hashtags 中有一个向量，你想拆分它。具体来说，对于第 3 行和第 4 行，您有多个主题标签。你想把它们分开。您可以使用 cSplit()来自 splitstackshape包裹。结果就是你想要的 df2 .一旦你有了它，你想要创建 df1 .您选择 status_id并寻找独特的 status_id .

library(dplyr)
library(splitstackshape)

df2 <- mydf %>%
       rowwise %>%
       mutate(hashtags = paste(unlist(hashtags), collapse = ",")) %>%
       cSplit(splitCols = "hashtags", sep = ",", direction = "long")

             status_id            hashtags
 1: 947306525726527488 NEWYEARSEVEPARTY919
 2: 947306316959281153    MakeItALifestyle
 3: 947306315952611330              Ejuice
 4: 947306315952611330                vape
 5: 947306315952611330              vaping
 6: 947306315952611330             eliquid
 7: 947306315952611330           ecigjuice
 8: 947306315952611330                ecig
 9: 947306315952611330           vapejuice
10: 947306265520328704             vapefam
11: 947306265520328704          vapenation
12: 947306265520328704          vapefamily
13: 947306265520328704            vapelife
14: 947306265520328704            vapelyfe
15: 947306265520328704              vapeon
16: 947306265520328704          positivity
17: 947305941522771968          nowplaying

df1 <- unique(df2[, 1, with = FALSE])

            status_id
1: 947306525726527488
2: 947306316959281153
3: 947306315952611330
4: 947306265520328704
5: 947305941522771968

修订

感谢splitstackshape作者的评论包，我们找到了一个更好的方法来处理这个任务。 listCol_l() 是将存储为列表的列取消列出为长格式的函数。所以所有必要的过程都可以在一条线上完成。

df2 <- listCol_l(mydf, "hashtags")

关于r - 通过分隔项将列表类型的列转换为长格式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48039799/

r - 通过分隔项将列表类型的列转换为长格式

上一篇：jpa - 重新创建实体类

下一篇：r - 无法从 Rscript 批处理文件调用 roxygenize 函数