r - 通过反向对子集数据

标签 r loops subset

下面的 data.frame 应该是逆对和一些条件的子集:

> foo
   ID Day  Period            Start              End
1  11   1 morning     Central Park Alphabet Village
2  11   1 morning     Central Park Alphabet Village
3  11   1 evening Alphabet Village        Grammercy
4  54   1 morning     Union Square        Chinatown
5  67   1 morning          Midtown           Harlem
6  67   1 morning           Harlem          Midtown
7  69   1 morning       Greenpoint Prospect Heights
8  54   1 evening        Chinatown     Union Square
9  77   1 morning       Park Slope     Williamsburg
10 73   1 evening     Williamsburg       Park Slope
11 88   2 morning        Grammercy     Battery Park
12 88   2 morning     Battery Park             SoHo
13 88   2 evening     Battery Park        Grammercy
14 69   2 evening Prospect Heights       Greenpoint
15 88   2 evening        Grammercy     Battery Park

例如,StartEnd 站逆对必须落在 相同的 Day,具有相同的 ID 而第一个必须发生在早上,第二个必须发生在晚上。 *编辑: 需要注意的是,只有一个 Start-End 可用于与 End-Start 配对。也就是说,一旦形成一对,原来的Start-End就不能再用来形成另一对。例如,记录 15 不能与记录 13 配对,因为 13 已被“占用”。

子集的输出总是偶数。在这种情况下,它将是:

   ID Day  Period        Start          End
3  54   1 morning Union Square    Chinatown
7  54   1 evening    Chinatown Union Square
10 88   2 morning    Grammercy Battery Park
11 88   2 evening Battery Park    Grammercy

我不确定 subset() 函数是否应该与 for 循环一起使用或如何构建循环。它应该这样说 - 如果 startend 等于下一行的 endstart 并且 ID = ID , Day = Day 第一条记录的Period = "早上”,而第二条记录 = “晚上”

我认为代码应该以这样的开头:if(foo[i-1,"start"] == foo[i,"end"]) & (foo[i-1,"end "] == foo[i,"start"]) 但我不确定。这个想法是保留所有满足这些条件的逆对。将不胜感激对要采取的步骤的任何指导和解释。

示例数据:

> dput(foo)
structure(list(ID = c(11L, 11L, 11L, 54L, 67L, 67L, 69L, 54L, 
77L, 73L, 88L, 88L, 88L, 69L, 88L), Day = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), Period = structure(c(2L, 
2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L), .Label = c("evening", 
"morning"), class = "factor"), Start = structure(c(3L, 3L, 1L, 
11L, 8L, 7L, 6L, 4L, 9L, 12L, 5L, 2L, 2L, 10L, 5L), .Label = c("Alphabet Village", 
"Battery Park", "Central Park", "Chinatown", "Grammercy", "Greenpoint", 
"Harlem", "Midtown", "Park Slope", "Prospect Heights", "Union Square", 
"Williamsburg"), class = "factor"), End = structure(c(1L, 1L, 
4L, 3L, 6L, 7L, 9L, 11L, 12L, 8L, 2L, 10L, 4L, 5L, 2L), .Label = c("Alphabet Village", 
"Battery Park", "Chinatown", "Grammercy", "Greenpoint", "Harlem", 
"Midtown", "Park Slope", "Prospect Heights", "SoHo", "Union Square", 
"Williamsburg"), class = "factor")), .Names = c("ID", "Day", 
"Period", "Start", "End"), class = "data.frame", row.names = c(NA, 
-15L))

最佳答案

按“ID”、“Day”分组后,过滤 unique 元素计数大于 1 的“Period”(ndistinct),然后将 factor 列更改为 character 并执行与 OP 帖子中的条件匹配的 filter

 library(dplyr)
 foo %>%
     group_by(ID, Day) %>%
     filter(n_distinct(Period)>1) %>% 
     mutate(Start = as.character(Start), End = as.character(End)) %>%
     filter(Start[1]==End[n()] & Start[n()] == End[1]) 
 #    ID   Day  Period        Start          End
 #  (int) (int)  (fctr)        (chr)        (chr)
 #1    54     1 morning Union Square    Chinatown
 #2    54     1 evening    Chinatown Union Square
 #3    88     2 morning    Grammercy Battery Park
 #4    88     2 evening Battery Park    Grammercy

dplyr 0.5.0及以上版本中,我们可以使用mutate_if

foo %>%
   group_by(ID, Day) %>% 
   filter(n_distinct(Period)>1)  %>% 
   mutate_if(is.factor, as.character) %>%
   filter(Start[1]==End[n()] & Start[n()] == End[1]) 
#     ID   Day  Period        Start          End
#   <int> <int>   <chr>        <chr>        <chr>
#1    54     1 morning Union Square    Chinatown
#2    54     1 evening    Chinatown Union Square
#3    88     2 morning    Grammercy Battery Park
#4    88     2 evening Battery Park    Grammercy

关于r - 通过反向对子集数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41517834/

相关文章:

c++ - 简单程序 - 菜单只显示一次

c# - 遍历列表并动态创建摘要行

r - 根据两个协变量级别的对应关系选择数据帧的行

r - data.table R 中的子集 ID 和日期

r - ggplot2:更改图例符号

r - 在 Chromebook Samsung 3 上安装 R 和 RStudio

r - 将列表转换为一行 data.frame

r - R中的语义错误,递归的使用

函数内部的Javascript闭包不是 "remember",而是环境

r - 如何从 R 中的 shapefile 中按属性消除某些区域并创建一个新的 shapefile?