以下是一个可重现的示例,其中包含所需的输出。
# Example
library(tidyverse)
df <- tribble(
~x,~y,
4,6,
4.5,5.5,
5,5,
5.4,4.5,
5.6,3.8,
5.7,3,
5.4,2.5,
5,2,
4.8,3)
# arbitrarily scaling because ordering needs to handle different x and y scales
df <- df %>% mutate(y = y*100)
# human eye draws the rough spiral connecting the points
ggplot(df, aes(x=x,y=y)) + geom_point()
# geom_line moves along x-axis, not desired output
ggplot(df, aes(x=x,y=y)) + geom_point() + geom_line()
# geom_path does it right - exactly what I'm after
ggplot(df, aes(x=x,y=y)) + geom_point() + geom_path()
# ...but I can't guarantee the df is going to start in the desired order:
df <- df %>% arrange(y) # arbitrarily sort by something else as orig order not guaranteed
# now geom_path doesn't work
ggplot(df, aes(x=x,y=y)) + geom_point() + geom_path()
#Q: how to go from an unordered df to a line plot with geom_path() that matches the dots in
# the same way as the human eye when just given the points.
我想我想要一个函数,它采用原始数据集,像人眼一样对其进行排序,然后可以使用 geom_path() 进行绘制。我已经尝试过一种方法,我称之为“second_min”,但它不起作用:
second_min_func <- function(df){
# try scaling (though not finished as need to scale back to orig axis after ordered)
df <- df %>% scale() %>% as_tibble()
# nest the dataframe back onto itself so can rowwise perform operations on all data
df2 <- df %>%
mutate(orig_order = row_number(),
temp=1) %>%
left_join(nest(df, data=everything()) %>% mutate(temp=1),
by="temp") %>%
select(-temp)
df3 <- df2 %>%
rowwise() %>%
mutate(
# euclidean distance for each point against all others
vec = list((x-df$x)^2 + (y-df$y)^2)
) %>%
ungroup() %>%
group_by(orig_order) %>%
mutate(
# the first minimum is 0 always as it's the point itself
second_min = which(vec[[1]]==vec[[1]][topn(vec[[1]],decreasing=FALSE)][2])
) %>%
select(x,y,orig_order,second_min)
df3 <- df3 %>% arrange(second_min)
return(df3)
}
# not desired output!
ggplot(second_min_func(df),aes(x=x,y=y)) + geom_point() + geom_path()
最佳答案
已经很晚了,我想在 sleep 前发布一个半途而废的解决方案。我建议计算一个k最近邻图,在最小生成树中找到最短路径,并使用访问的顶点作为顺序。
只是表明我把订单搞砸了:
library(tidyverse)
library(scales)
library(igraph)
df <- tribble(
~x,~y,
4,6,
4.5,5.5,
5,5,
5.4,4.5,
5.6,3.8,
5.7,3,
5.4,2.5,
5,2,
4.8,3)
# arbitrarily scaling because ordering needs to handle different x and y scales
df <- df %>% mutate(y = y*100)
# Random order
set.seed(42)
df <- df[sample(seq_len(nrow(df))),]
# Show order is scrambled
ggplot(df, aes(x, y)) +
geom_point() +
geom_path()
方法如下:
# Rescale
df <- df %>%
mutate(x = rescale(x)) %>%
mutate(y = rescale(y))
# Euclidean distance
d <- dist(df[, c("x", "y")], method = "euclidean")
d <- as.matrix(d)
# Find k nearest neighbours
k <- 2
diag(d) <- Inf # Don't allow self to be nearest neighbour
nn <- apply(d, 1, rank, ties.method = "random")
nn <- apply(nn, 2, function(x) {which(x <= k)})
# Make graph from nn list
elist <- cbind(rep(as.numeric(colnames(nn)), each = nrow(nn)),
as.vector(nn))
g <- graph_from_edgelist(elist, directed = FALSE)
E(g)$weight <- d[elist]
# Calculate longest shortest path (?) through minimum spanning tree
g <- mst(g)
path <- all_shortest_paths(g, from = V(g))
path <- path$res[which.max(lengths(path$res))][[1]]
# Order is the vertices the shortest path visits
order <- as.integer(path)
# Reorder scrambled df
ggplot(df[order,], aes(x=x,y=y)) +
geom_point() +
geom_path()
由 reprex package 于 2021-08-06 创建(v2.0.0)
它适用于这个特定的数据集,这个种子用于这个特定的 k
,我不知道这对更具挑战性的数据集的推广效果如何。但至少它是一些东西。
关于r - 对数据集排序,以便 geom_path() 以与人眼相同的方式连接点,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68674056/