r - 对数据集排序,以便 geom_path() 以与人眼相同的方式连接点

标签 r ggplot2 dplyr

以下是一个可重现的示例,其中包含所需的输出。

# Example
library(tidyverse)
df <- tribble(
  ~x,~y,
  4,6,
  4.5,5.5,
  5,5,
  5.4,4.5,
  5.6,3.8,
  5.7,3,
  5.4,2.5,
  5,2,
  4.8,3)
# arbitrarily scaling because ordering needs to handle different x and y scales
df <- df %>% mutate(y = y*100)
# human eye draws the rough spiral connecting the points 
ggplot(df, aes(x=x,y=y)) + geom_point() 
# geom_line moves along x-axis, not desired output
ggplot(df, aes(x=x,y=y)) + geom_point() + geom_line() 
# geom_path does it right - exactly what I'm after
ggplot(df, aes(x=x,y=y)) + geom_point() + geom_path()

# ...but I can't guarantee the df is going to start in the desired order:
df <- df %>% arrange(y) # arbitrarily sort by something else as orig order not guaranteed
# now geom_path doesn't work
ggplot(df, aes(x=x,y=y)) + geom_point() + geom_path()

#Q: how to go from an unordered df to a line plot with geom_path() that matches the dots in 
# the same way as the human eye when just given the points.

我想我想要一个函数,它采用原始数据集,像人眼一样对其进行排序,然后可以使用 geom_path() 进行绘制。我已经尝试过一种方法,我称之为“second_min”,但它不起作用:

second_min_func <- function(df){
  
  # try scaling (though not finished as need to scale back to orig axis after ordered)
  df <- df %>% scale() %>% as_tibble()
  
  # nest the dataframe back onto itself so can rowwise perform operations on all data
  df2 <- df  %>% 
    mutate(orig_order = row_number(),
           temp=1) %>% 
    left_join(nest(df, data=everything()) %>% mutate(temp=1),
              by="temp") %>%
    select(-temp)
  
  df3 <- df2 %>% 
    rowwise() %>% 
    mutate(
      # euclidean distance for each point against all others
      vec = list((x-df$x)^2 + (y-df$y)^2)
    ) %>%
    ungroup() %>% 
    group_by(orig_order) %>% 
    mutate(
      # the first minimum is 0 always as it's the point itself
      second_min = which(vec[[1]]==vec[[1]][topn(vec[[1]],decreasing=FALSE)][2])
    ) %>%
    select(x,y,orig_order,second_min)
  
  df3 <- df3 %>% arrange(second_min)
  return(df3)
}

# not desired output!
ggplot(second_min_func(df),aes(x=x,y=y)) + geom_point() + geom_path()

最佳答案

已经很晚了,我想在 sleep 前发布一个半途而废的解决方案。我建议计算一个k最近邻图,在最小生成树中找到最短路径,并使用访问的顶点作为顺序。

只是表明我把订单搞砸了:

library(tidyverse)
library(scales)
library(igraph)

df <- tribble(
  ~x,~y,
  4,6,
  4.5,5.5,
  5,5,
  5.4,4.5,
  5.6,3.8,
  5.7,3,
  5.4,2.5,
  5,2,
  4.8,3)
# arbitrarily scaling because ordering needs to handle different x and y scales
df <- df %>% mutate(y = y*100)

# Random order
set.seed(42)
df <- df[sample(seq_len(nrow(df))),]

# Show order is scrambled
ggplot(df, aes(x, y)) +
  geom_point() +
  geom_path()

方法如下:

# Rescale
df <- df %>%
  mutate(x = rescale(x)) %>%
  mutate(y = rescale(y))

# Euclidean distance
d <- dist(df[, c("x", "y")], method = "euclidean")
d <- as.matrix(d)

# Find k nearest neighbours
k <- 2
diag(d) <- Inf # Don't allow self to be nearest neighbour
nn <- apply(d, 1, rank, ties.method = "random")
nn <- apply(nn, 2, function(x) {which(x <= k)})

# Make graph from nn list
elist <- cbind(rep(as.numeric(colnames(nn)), each = nrow(nn)),
               as.vector(nn))
g <- graph_from_edgelist(elist, directed = FALSE)
E(g)$weight <- d[elist]

# Calculate longest shortest path (?) through minimum spanning tree
g <- mst(g)
path <- all_shortest_paths(g, from = V(g))
path <- path$res[which.max(lengths(path$res))][[1]]

# Order is the vertices the shortest path visits
order <- as.integer(path)

# Reorder scrambled df
ggplot(df[order,], aes(x=x,y=y)) + 
  geom_point() +
  geom_path()

reprex package 于 2021-08-06 创建(v2.0.0)

它适用于这个特定的数据集,这个种子用于这个特定的 k,我不知道这对更具挑战性的数据集的推广效果如何。但至少它是一些东西。

关于r - 对数据集排序,以便 geom_path() 以与人眼相同的方式连接点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68674056/

相关文章:

r - stat_qq 设置组时去掉值

r - 如何序列化/反序列化 ggplot 对象以在 R 中与 mongodb 一起使用

r - 带 geojson 和 ggplot2 的等值线图

r - 使用 ggplot 2 以对数刻度绘制负值

r - dplyr::summarise 根据另一列 max 提取值

html - Bootstrap 导航栏不填满整个宽度

r - 如何在单击按钮时截取 Shiny 应用程序的屏幕截图?

r - 在 mutate 内循环

r - 使用 dplyr 窗口函数计算百分位数

r - 交易从长到宽 reshape ,加入买卖数据帧