r - 在 ggplot 和分割(分支)线中堆叠两个水平的因子

标签 r ggplot2 dplyr

考虑以下data.frame:

df <- structure(list(trial = structure(c(1L, 1L, 1L, 1L, 1L, 3L, 3L, 
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 
5L, 5L, 6L, 6L, 6L, 6L, 6L), .Label = c("ES1-7", "ES8-13", "ES14-25", 
"ES26-38", "SA1-13", "SA14-25"), class = "factor"), marker = c(0L, 
0L, 0L, 2L, 2L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 
0L, 2L, 2L, 0L, 0L, 0L, 2L, 2L, 0L, 0L, 0L, 2L, 2L), rs. = c("S1A_499864157", 
"S1B_566171302", "S1B_642616640", "S1A_494392059", "S1A_497201550", 
"S1A_499864157", "S1B_566171302", "S1B_642616640", "S1A_494392059", 
"S1A_497201550", "S1A_499864157", "S1B_566171302", "S1B_642616640", 
"S1A_494392059", "S1A_497201550", "S1A_499864157", "S1B_566171302", 
"S1B_642616640", "S1A_494392059", "S1A_497201550", "S1A_499864157", 
"S1B_566171302", "S1B_642616640", "S1A_494392059", "S1A_497201550", 
"S1A_499864157", "S1B_566171302", "S1B_642616640", "S1A_494392059", 
"S1A_497201550"), n = c(41L, 44L, 5L, 96L, 63L, 115L, 95L, 12L, 
282L, 160L, 320L, 200L, 25L, 402L, 320L, 42L, 32L, 2L, 82L, 64L, 
191L, 151L, 56L, 291L, 222L, 251L, 186L, 48L, 310L, 281L), prop = c(0.304054054054054, 
0.320945945945946, 0.287162162162162, 0.665540540540541, 0.452702702702703, 
0.311576354679803, 0.257389162561576, 0.261083743842365, 0.706896551724138, 
0.415024630541872, 0.594736842105263, 0.394736842105263, 0.32719298245614, 
0.72719298245614, 0.597368421052632, 0.438095238095238, 0.328571428571429, 
0.276190476190476, NA, 0.638095238095238, 0.427350427350427, 
0.351495726495727, 0.245726495726496, 0.636752136752137, 0.512820512820513, 
0.54517453798768, 0.415811088295688, 0.322381930184805, 0.650924024640657, 
0.600616016427105), BASE = c("T", "A", "G", "C", "C", "T", "A", 
"G", "C", "C", "T", "A", "G", "C", "C", "T", "A", "G", "C", "C", 
"T", "A", "G", "C", "C", "T", "A", "G", "C", "C"), alleles = c("C/T", 
"G/A", "A/G", "C/G", "C/T", "C/T", "G/A", "A/G", "C/G", "C/T", 
"C/T", "G/A", "A/G", "C/G", "C/T", "C/T", "G/A", "A/G", "C/G", 
"C/T", "C/T", "G/A", "A/G", "C/G", "C/T", "C/T", "G/A", "A/G", 
"C/G", "C/T")), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

我正在使用以下代码进行绘图:

tt <- ggplot(df, aes(x = trial, y = prop, color = rs.)) + 
  geom_point() +
  theme(axis.text.x = element_text(angle = 90,hjust = 1))+
  facet_grid(.~rs.)+
  geom_line(data = . %>%
              mutate(trial = as.numeric(trial)) %>%
              group_by(rs.) %>%
              summarise(x1 = list(spline(trial, prop, n = 50, method = "natural")[["x"]]),
                        y1 = list(spline(trial, prop, n = 50, method = "natural")[["y"]])) %>%

              tidyr::unnest(),
            aes(x = x1, y = y1)) +
  labs(subtitle="Favorable allele over time", 
       y="allele frequency", 
       x="Groups", 
       title="Yield QTL markers",
       col='markers')

我得到以下图像:

enter image description here

我想将以下级别堆叠在一起:

  • “ES14-25”和“SA1-13”;
  • “ES26-38”和“SA14-25”。

在我想要的图中,我将有两条从 ES8-13 分支的线。一个分支是 ES14-25 -> ES26-38,另一个分支是 SA1-13 -> SA14-25。这里的想法是“ES14-25”与“SA1-13”同时出现,而“ES26-38”与“SA14-25”同时出现。

通过以下两张图,我尝试直观地解释我的意思:

enter image description here enter image description here

最佳答案

您可以在将数据帧传递给 ggplot() 之前对其进行处理:

df %>%

  # remove rows without a valid value for the y-axis
  filter(!is.na(prop)) %>%

  # create appropriate x-axis values based on trial values
  mutate(x = case_when(trial == "ES1-7" ~ 1,
                       trial == "ES8-13" ~ 2,
                       trial %in% c("ES14-25", "SA1-13") ~ 3,
                       trial %in% c("ES26-38", "SA14-25") ~ 4,
                       TRUE ~ 0)) %>%

  # expand data frame by repeating the last point before divergence
  # for each rs. facet
  group_by(rs.) %>%
  mutate(last.point.before.divergence = x == max(x[x <= 2])) %>%
  ungroup() %>% 
  slice(c(1:n(),
          which(last.point.before.divergence))) %>% 

  # create group for line
  group_by(rs., x) %>%
  arrange(trial) %>% 
  mutate(group = seq(1, n())) %>%
  ungroup() %>% 

  ggplot(aes(x = x, y = prop, color = rs., 
             linetype = factor(group))) +
  geom_point() +
  geom_line(data = . %>%
              group_by(rs., group) %>%
              summarise(x1 = list(spline(x, prop, n = 50, method = "natural")[["x"]]),
                        y1 = list(spline(x, prop, n = 50, method = "natural")[["y"]])) %>%

              tidyr::unnest(),
            aes(x = x1, y = y1)) +
  facet_grid(.~rs.) +
  scale_x_continuous(breaks = seq(1, 4),
                     labels = c("ES1-7", "ES8-13",
                                "ES14-25 / SA1-13",
                                "ES26-38 / SA14-25")) +
  labs(subtitle="Favorable allele over time", 
       y="allele frequency", 
       x="Groups", 
       title="Yield QTL markers",
       col='markers') +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

result

关于r - 在 ggplot 和分割(分支)线中堆叠两个水平的因子,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54991384/

相关文章:

python - 使用 rpy2 更改 ggplot2 中的图例

r - 如何使用 directlabels 和 ggplot2?

r - 当 X 值相同时,在分面网格中的两个图形上强制 X 轴

r - 多层 : assign functions to cluster

使用 tidyverse 根据不同的列或行位置用 NA 替换列值

r - 如何在 R 中的两个因素之间填充值?

r - R中数据框中新列的查找表

r - 如何访问/复制 View 功能?

R:通过对数据表进行分组,获取每组的第一行及其对应的行号

r - 如何让geom_area()不被堆叠