r - 将点平均在一起而不重复并减少最终数据帧

标签 r spatial sf

目标是对 10 米内的点进行平均,而不重复平均中的任何点,将点数据帧减少为平均点,理想情况下,沿着收集这些点的路线获得平稳的点流。这是一个来自更大文件(25,000 个观察值)的 11 点子集示例数据框:

library(sf)
df <- data.frame(trait = as.numeric(c(91.22,91.22,91.22,91.58,91.47,92.19,92.19,90.57,90.57,91.65,91.65)), 
                 datetime = as.POSIXct(c("2021-08-06 15:08:43","2021-08-06 15:08:44","2021-08-06 15:08:46","2021-08-06 15:08:47","2021-08-06 15:43:17","2021-08-06 15:43:18","2021-08-06 15:43:19","2021-08-06 15:43:20","2021-08-06 15:43:21","2021-08-06 15:43:22","2021-08-06 15:43:23")),
                 lat = c(39.09253, 39.09262, 39.09281, 39.09291, 39.09248, 39.09255, 39.09261, 39.09266, 39.0927, 39.09273, 39.09274),
                 lon = c(-94.58463, -94.58462, -94.5846, -94.58459, -94.58464, -94.58464, -94.58464, -94.58464, -94.58466, -94.5847, -94.58476)
                 ) # just to add some value that is plotable
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
df <- st_as_sf(x = df,                         
               coords = c("lon", "lat"),
               crs = projcrs)

这是我尝试过的:

  1. st_is_within_distance(trav, trav, tolerance) 的多次迭代,包括:
  2. 一个aggregate method显示在这里。这些方法不起作用,因为相同的点会被多次平均。
  3. 通过尝试在 lapply 中动态更新列表,接近 filteracross 但最终没有成功。
  4. This is helpful来自@jeffreyevans,但并没有真正解决问题,而且有点过时了。
  5. spThin 包不起作用,因为它是为更具体的变量而设计的。
  6. 我想 cluster using this post ,但集群会抛出随机点,实际上并没有有效地减少数据帧。

这是我所得到的最接近的。同样,此解决方案的问题在于它在收集平均值时重复了点,这使得某些点比其他点具有更高的权重。

  # first set tolerance
  tolerance <- 20 # 20 meters
  
  # get distance between points
  i <- st_is_within_distance(df, df, tolerance)
  
  # filter for indices with more than 1 (self) neighbor
  i <- i[which(lengths(i) > 1)]   
 
  # filter for unique indices (point 1, 2 / point 2, 1)
  i <- i[!duplicated(i)]
    
  # points in `sf` object that have no neighbors within tolerance
  no_neighbors <- trav[!(1:nrow(df) %in% unlist(i)),  ]  

  # iterate over indices of neighboring points
  avg_points <- lapply(i, function(b){
    df <- df[unlist(b), ]
    coords <- st_coordinates(df)
    
    df <- df %>%
      st_drop_geometry() %>%
      cbind(., coords)
    
    df_sum <-  df %>%
      summarise(
        datetime = first(datetime),
        trait = mean(trait),
        X = mean(X),
        Y = mean(Y),
        .groups = 'drop') %>% 
        ungroup()

    return(df)
  }) %>% 
    
    bind_rows() %>% 
    st_as_sf(coords = c('X', 'Y'),
             crs = "+proj=longlat +datum=WGS84 +no_defs ") 

最佳答案

另一个答案,使用 sf::aggregate() 和六边形网格来查找彼此之间特定距离内的点。也可以使用正方形网格。结果会有所不同,具体取决于网格相对于点的确切位置,但在确定平均值时不应多次使用任何点。

步骤大纲:

  • 加载数据,转换为 crs 5070 以进行以米为单位的测量
  • 获取数据的边界框
  • 为每个直径约 10 米的边界框制作一个六边形网格
  • 使用 mean 聚合落在同一个六边形中的点
  • 加入原始数据
library(sf)
library(tidyverse)

set.seed(22) # might be needed to get same hex grid?

#### your sample data
df <- data.frame(trait = as.numeric(c(91.22,91.22,91.22,91.58,91.47,92.19,92.19,90.57,90.57,91.65,91.65)), 
                 datetime = as.POSIXct(c("2021-08-06 15:08:43","2021-08-06 15:08:44","2021-08-06 15:08:46","2021-08-06 15:08:47","2021-08-06 15:43:17","2021-08-06 15:43:18","2021-08-06 15:43:19","2021-08-06 15:43:20","2021-08-06 15:43:21","2021-08-06 15:43:22","2021-08-06 15:43:23")),
                 lat = c(39.09253, 39.09262, 39.09281, 39.09291, 39.09248, 39.09255, 39.09261, 39.09266, 39.0927, 39.09273, 39.09274),
                 lon = c(-94.58463, -94.58462, -94.5846, -94.58459, -94.58464, -94.58464, -94.58464, -94.58464, -94.58466, -94.5847, -94.58476)
) # just to add some value that is plotable
projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
df <- st_as_sf(x = df,                         
               coords = c("lon", "lat"),
               crs = projcrs) %>%
  st_transform(5070)  ### transform to 5070 for a projection in meters
#### end sample data


# Get a bounding box as an sf object to make a grid
bbox <- st_bbox(df) %>% st_as_sfc()

# Make a grid as hexagons with approximately the right size 
#  area ~86m; side ~5.75m; long diag ~11.5m 
hex_grid <- st_make_grid(bbox, cellsize = 10, square = F) %>% st_as_sf()

# Aggregate mean of the hexagonal grid
hex_agg <- aggregate(df ,
                     hex_grid,
                     mean,
                     join = st_contains) %>% filter(!is.na(trait))

# Assign the mean of the hexagon to points that fall 
#  within each hexagon
df_agg <- st_join(df, hex_agg)

head(df_agg) # trait.x from df, trait.y from the mean by hexagon
#> Simple feature collection with 6 features and 4 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 121281.6 ymin: 1786179 xmax: 121285.3 ymax: 1786227
#> Projected CRS: NAD83 / Conus Albers
#>   trait.x          datetime.x  trait.y          datetime.y
#> 1   91.22 2021-08-06 15:08:43 91.70500 2021-08-06 15:26:00
#> 2   91.22 2021-08-06 15:08:44 91.32667 2021-08-06 15:31:47
#> 3   91.22 2021-08-06 15:08:46 91.22000 2021-08-06 15:08:46
#> 4   91.58 2021-08-06 15:08:47 91.58000 2021-08-06 15:08:47
#> 5   91.47 2021-08-06 15:43:17 91.47000 2021-08-06 15:43:17
#> 6   92.19 2021-08-06 15:43:18 91.70500 2021-08-06 15:26:00
#>                   geometry
#> 1 POINT (121282.5 1786184)
#> 2 POINT (121283.2 1786194)
#> 3 POINT (121284.6 1786216)
#> 4 POINT (121285.3 1786227)
#> 5 POINT (121281.7 1786179)
#> 6 POINT (121281.6 1786186)

sum(df_agg$trait.x) - sum(df_agg$trait.y) # original trait - aggregate trait should be 0, or near 0
#> [1] 0

ggplot(df_agg) + 
  geom_sf(aes(size = trait.x), alpha = .2, color = 'blue') +  # Original triat
  geom_sf(aes(size = trait.y), alpha = .2, color = 'red') +  # New aggregated trait
  theme_void()

按特征大小。蓝点是原始的,红色是新的空间均值。


## Plot of
# original points & hex grid used:
ggplot() + 
  geom_sf(data = df, color = 'red') + 
  geom_sf(data = hex_grid, fill = NA) +
  theme_void()

绘图显示了均值点的分组。看起来每个六边形有 1、2 和 3 个点的组的平均值。

reprex package 创建于 2022-03-23 (v2.0.1)

编辑

更新为每个六边形只有一个点,丢失了一些原始点

## Edit for one point per hexagon:
hex_agg <- aggregate(df ,
                     hex_grid,
                     mean,
                     join = st_contains) %>% filter(!is.na(trait)) %>%
  rownames_to_column('hex_num')  # add hexagon number to group_by

## Guide to join on, has only hexagon number & centroid of contained points
hex_guide <- df_agg %>%
  group_by(hex_num) %>%
  summarise() %>%
  st_centroid()

# The full sf object with only one point per hexagon
#  this join isn't the most efficient, but slice(1) removes
#  the duplicate data. You could clean df_agg before the join
#  to resolve this
final_join <- df_agg %>%
                 st_drop_geometry() %>%
                 left_join(hex_guide, by = 'hex_num') %>% 
                 group_by(hex_num) %>%
                 slice(1) %>%
                 ungroup() %>%
                 st_as_sf()

ggplot() +
  geom_sf(data = final_join, color = 'red', size = 3) +
  geom_sf(data = df, color = 'black', alpha = .5) +
  geom_sf(data = hex_grid, color = 'blue', fill = NA)

该图显示了六边形、灰色的原始数据点和分组原始点的质心处的新红色点。每个六边形只有 1 个红点。

enter image description here

关于r - 将点平均在一起而不重复并减少最终数据帧,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71579664/

相关文章:

r - 根据来自不同列的位置计算数据框中的平均值

sql-server-2008 - SQL Server 2008 地理 LineString 大小限制

r - 检查 SF 几何在 R 中是否连续

r - 使用 dplyr distinct 忽略 R 中 sf 对象的几何形状

r - 来自 R 中经纬度点簇的多边形

java - R包中的RWeka错误

r - 多行图例文本,包括带 ggplot 的指数

r - 可视化 R 中整数数据的频率

c# - Entity Framework 未从 SQL Server 存储过程中获取空间类型数据

.net - 标准 SpatialRestrictions.IsWithinDistance NHibernate.Spatial