r - 为什么在使用 pivot_wider 时会产生 NA 值?

标签 r dataframe tidyverse tidyr na

我正在尝试使用 pivot wider创建包含值的多个列/变量,但我不应该在列中创建 NA。

以下是数据的代表性样本:

df <- structure(list(Condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Control", "Retraction1", 
"Retraction2"), class = "factor"), First = structure(c(2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Second = structure(c(3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Third = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Fourth = structure(c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), ID = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100", 
"101"), class = "factor"), Scenario = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 1L, 2L, 3L, 4L), .Label = c("J", "P", "R", 
"S"), class = "factor"), Estimate = structure(c(4L, 8L, 7L, 11L, 
9L, 12L, 10L, 2L, 5L, 6L, 4L, 7L, 11L, 9L, 12L, 10L, 2L, 3L, 
5L, 6L, 4L, 8L, 7L, 11L, 9L, 12L, 10L, 2L, 5L, 6L, 4L, 8L, 7L, 
11L, 9L, 12L, 10L, 2L, 5L, 6L, 1L, 1L, 1L, 1L), .Label = c("CompMean", 
"P.H.Reps.", "P.H.Reps..1", "P.Rel.", "P.Rel1.Reps.", "P.Rel2.Reps.", 
"P.Rep1.nH.nRel.", "P.Rep1.nH.Rel.", "P.Rep2.nH.nRel.nRep1.", 
"P.Rep2.nH.nRel.Rep1.", "P.Rep2.nH.Rel.nRep1.", "P.Rep2.nH.Rel.Rep1."
), class = "factor"), value = c(90L, 8L, 82L, 11L, 82L, 11L, 
82L, 100L, 99L, NA, 62L, 11L, 91L, 12L, 91L, 5L, 82L, 91L, 80L, 
NA, 92L, 12L, 61L, 18L, 90L, 21L, 81L, 96L, 92L, NA, 91L, 10L, 
72L, 22L, 62L, 21L, 73L, 99L, 98L, NA, 7L, 7L, 7L, 7L)), row.names = c(NA, 
-44L), class = c("tbl_df", "tbl", "data.frame"))

head(df)

这是来自一个主题的数据。 P.Rel2.Reps. 中应该只有 NA没有其他。

但是,当我像这样使用更宽的枢轴时,其他一些列中有 NAs:
pivot_wider(df, names_from = Estimate, values_from = value)
这是一个示例,说明数据在旋转更宽后的样子。
df2 <- structure(list(Condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L), .Label = c("Control", "Retraction1", "Retraction2"
), class = "factor"), First = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Journalist", "Police", "Reviewer", 
"Spokesperson"), class = "factor"), Second = structure(c(3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Third = structure(c(1L, 
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), Fourth = structure(c(4L, 
4L, 4L, 4L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Journalist", 
"Police", "Reviewer", "Spokesperson"), class = "factor"), ID = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
"27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
"38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55", "56", "57", "58", "59", 
"60", "61", "62", "63", "64", "65", "66", "67", "68", "69", "70", 
"71", "72", "73", "74", "75", "76", "77", "78", "79", "80", "81", 
"82", "83", "84", "85", "86", "87", "88", "89", "90", "91", "92", 
"93", "94", "95", "96", "97", "98", "99", "100", "101"), class = "factor"), 
    Scenario = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
    2L), .Label = c("J", "P", "R", "S"), class = "factor"), P.Rel. = c(90L, 
    62L, 92L, 91L, 57L, 81L, 71L, 80L, 40L, 75L), P.Rep1.nH.Rel. = c(8L, 
    NA, 12L, 10L, 31L, NA, 19L, 17L, 25L, NA), P.Rep1.nH.nRel. = c(82L, 
    11L, 61L, 72L, 89L, 15L, 79L, 84L, 76L, 25L), P.Rep2.nH.Rel.nRep1. = c(11L, 
    91L, 18L, 22L, 35L, 64L, 30L, 22L, 25L, 50L), P.Rep2.nH.nRel.nRep1. = c(82L, 
    12L, 90L, 62L, 62L, 13L, 45L, 53L, 25L, 50L), P.Rep2.nH.Rel.Rep1. = c(11L, 
    91L, 21L, 21L, 15L, 52L, 9L, 10L, 100L, 50L), P.Rep2.nH.nRel.Rep1. = c(82L, 
    5L, 81L, 73L, 67L, 22L, 60L, 61L, 100L, 25L), P.H.Reps. = c(100L, 
    82L, 96L, 99L, 81L, 40L, 71L, 76L, 75L, 90L), P.Rel1.Reps. = c(99L, 
    80L, 92L, 98L, 81L, 80L, 89L, 79L, 75L, 76L), P.Rel2.Reps. = c(NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, 
    NA_integer_, NA_integer_, NA_integer_, NA_integer_), P.H.Reps..1 = c(NA, 
    91L, NA, NA, NA, 80L, NA, NA, NA, 100L), CompMean = c(7L, 
    7L, 7L, 7L, 7L, 7L, 7L, 6L, 4L, 7L)), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))

head(df2)


我看到有一个关于这个主题的类似帖子,但它没有回答为什么在我的情况下会产生 NA。

我需要添加一些其他参数吗?

最佳答案

查看数据,您似乎在某个地方有一些损坏的数据。你可以通过

df$Estimate <- replace(df$Estimate, df$Estimate == "P.H.Reps..1", "P.Rep1.nH.Rel.") 

然后使用 pivot_wider这会给你 NA仅在列中,即 P.Rel2.Reps.
tidyr::pivot_wider(df, names_from = Estimate, values_from = value) 

关于r - 为什么在使用 pivot_wider 时会产生 NA 值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59623400/

相关文章:

r - 选择 R 中特定列中包含全部或少于 n 个 NA 的行

r - 按周计算每日观察次数

r - "gam"插入符号模型不返回fitting.values

c++ - 将 dnorm 与 RcppArmadillo 结合使用

r - 在 R 的网格图形中保留纵横比

python - 如何在Python中根据条件对多列进行分组并创建新列?

r - Advanced R 中修改列表的示例

python - 使用基于 2 个键列和 1 个值列的嵌套字典将 pandas 数据框转换为字典

r - r 中的嵌套 dplyr 循环

mysql - 如何将带有 like 子句的 SQL 内连接转换为 dplyr 工作流程?