我正在处理以下格式的数据
Year Site Zone Species
2010 A a cat
2010 A a dog
2010 A b cat
2010 B a rabbit
2010 B a cat
2010 B b cat
2011 A a dog
2011 A a cat
2011 A b rabbit
2011 B a cat
2011 B b dog
2011 B b cat
我想获得:
Year Site Zone cat dog rabbit
2010 A a 1 1 0
2010 A b 1 0 0
2010 B a 1 0 1
2010 B b 1 0 0
2011 A a 1 1 0
2011 A b 0 0 1
2011 B a 1 0 0
2011 B b 1 1 0
在 R 中使用 dplyr(或其他方式)执行此操作的最佳方法是什么?
这是长格式数据的dput
:
structure(list(Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2010L,
2011L, 2011L, 2011L, 2011L, 2011L, 2011L), Site = c("A", "A",
"A", "B", "B", "B", "A", "A", "A", "B", "B", "B"), Zone = c("a",
"a", "b", "a", "a", "b", "a", "a", "b", "a", "b", "b"), Species = c("cat",
"dog", "cat", "rabbit", "cat", "cat", "dog", "cat", "rabbit",
"cat", "dog", "cat")), class = "data.frame", row.names = c(NA,
-12L))
最佳答案
使用 pivot_wider
-
tidyr::pivot_wider(df, names_from = Species, values_from = Species,
values_fn = length, values_fill = 0)
# Year Site Zone cat dog rabbit
# <int> <chr> <chr> <int> <int> <int>
#1 2010 A a 1 1 0
#2 2010 A b 1 0 0
#3 2010 B a 1 0 1
#4 2010 B b 1 0 0
#5 2011 A a 1 1 0
#6 2011 A b 0 0 1
#7 2011 B a 1 0 0
#8 2011 B b 1 1 0
或者用data.table
-
library(data.table)
dcast(setDT(df), Year + Site + Zone ~ Species, fun.aggregate = length)
关于r - 当每行是年/站点/区域时为计数创建宽数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67658405/