r - 使用 reframe 或complete 根据数据中的最小/最大值生成数据集

我正在尝试使用一个数据集中的值来创建另一个数据集以进行模型预测。

我的数据集有两个站点(A 和 B)，不同年份的数据，每个站点的范围不同，以及大量个体(站点和年份的比率也不同)。

我需要最终数据集包含站点、该站点的最小-最大年份以及从最小到最大的质量值(增量为 0.1)的所有独特组合。例如，站点 A 有 5 年的数据，质量范围为 2-5，因此应该有 205 个组合(1 个站点 x 5 年 x 31 个质量值)

# example dataset
df <- data.frame(site = c(rep("A", 20),                      # 20 obs for site A
                          rep("B", 30)),                     # 30 obs for site B
                 year = c(sample(1:5, 20, replace = TRUE),           # 5 years for site A
                          sample(c(1:4, 6:7), 30, replace = TRUE)),  # 6 years for site B, resulting range should span 1-7 (including 5)
                 mass = c(sample(seq(2, 5, 0.1), 20, replace = TRUE),    # different range for A than B
                          sample(seq(1, 6, 0.1), 30, replace = TRUE)))   # different range for A than B

# I've tried using complete, but it doesn't recognize mass
df %>% complete(year, nesting(site), 
                fill = list(seq(min(mass), max(mass), 0.1)))
Error in seq(min(mass), max(mass), 0.1) : object 'mass' not found

# I've also tried reframe, but it doesn't cover the full range of masses
df %>% reframe(year = min(year):max(year), .by = c(site, mass))

最佳答案

您可以expand.grid从seq沿范围影响。

> res <-
+   by(df, df$site, \(x) 
+      cbind(site=x$site[1], 
+            expand.grid(year=do.call('seq.int', c(as.list(range(x$year)), 1)),
+                        mass=do.call('seq.int', c(as.list(range(x$mass)), .1))))) |>
+   do.call(what='rbind')
> 
> by(res, res$site, summary)
res$site: A
     site                year        mass     
 Length:130         Min.   :1   Min.   :2.00  
 Class :character   1st Qu.:2   1st Qu.:2.60  
 Mode  :character   Median :3   Median :3.25  
                    Mean   :3   Mean   :3.25  
                    3rd Qu.:4   3rd Qu.:3.90  
                    Max.   :5   Max.   :4.50  
--------------------------------------------------------------------------- 
res$site: B
     site                year        mass      
 Length:336         Min.   :1   Min.   :1.100  
 Class :character   1st Qu.:2   1st Qu.:2.275  
 Mode  :character   Median :4   Median :3.450  
                    Mean   :4   Mean   :3.450  
                    3rd Qu.:6   3rd Qu.:4.625  
                    Max.   :7   Max.   :5.800

数据:

> dput(df)
structure(list(site = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", 
"B", "B", "B"), year = c(1L, 5L, 1L, 1L, 2L, 4L, 2L, 2L, 1L, 
4L, 1L, 5L, 4L, 2L, 2L, 3L, 1L, 1L, 3L, 4L, 6L, 6L, 6L, 4L, 2L, 
4L, 3L, 2L, 1L, 2L, 7L, 3L, 7L, 2L, 4L, 4L, 7L, 2L, 6L, 4L, 6L, 
4L, 2L, 2L, 3L, 1L, 6L, 2L, 2L, 7L), mass = c(2.5, 2.1, 3.9, 
2.2, 4.1, 4, 2.1, 4.2, 2.5, 4.5, 2.9, 2.7, 2.4, 2, 3.6, 2.6, 
2.3, 3.2, 2.9, 2.8, 3.8, 2.1, 2.9, 1.8, 5.2, 4.4, 3.8, 2.5, 4.6, 
3.7, 5.5, 1.4, 3.7, 1.1, 2.7, 3.3, 5.8, 2.7, 1.4, 5.5, 4.9, 4.9, 
3, 4.5, 4.5, 4.8, 5.1, 2.7, 3.6, 2.2)), class = "data.frame", row.names = c(NA, 
-50L))

关于r - 使用 reframe 或complete 根据数据中的最小/最大值生成数据集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/77705643/

r - 使用 reframe 或complete 根据数据中的最小/最大值生成数据集

上一篇：flutter - Firebase Firestore 数据库聊天应用程序分页时，点击消息文本字段时，它会自动向上滚动，而不是向下滚动到列表末尾

下一篇：Swift - 如何正确获取 CPU 负载