r - 如何让 geom_map 显示 map 的所有部分?

标签 r plot ggplot2

我刚刚开始使用 ggplot2 中的 geom_map 函数。阅读完我在 geom_map 上找到的 29 篇文章后,我仍然遇到同样的问题。

我的数据框大得离谱,包含 2000 多行。它基本上是由世界卫生组织汇编的特定基因(TP53)的数据。

请从here下载.

标题如下所示:

> head(ARCTP53_SOExample)
  Mutation_ID MUT_ID hg18_Chr17_coordinates hg19_Chr17_coordinates ExonIntron Genomic_nt Codon_number
1          16   1789                7519192                7578467     5-exon      12451          155
2          13   1741                7519200                7578475     5-exon      12443          152
3          17   2143                7519131                7578406     5-exon      12512          175
4          14   2143                7519131                7578406     5-exon      12512          175
5          15   2168                7519128                7578403     5-exon      12515          176
6          12   3737                7517845                7577120     8-exon      13798          273
  Description c_description g_description       g_description_hg18 WT_nucleotide Mutant_nucleotide
1         A>G      c.463A>G  g.7578467T>C NC_000017.9:g.7519192T>C           A                   G
2         C>T      c.455C>T  g.7578475G>A NC_000017.9:g.7519200G>A           C                   T
3         G>A      c.524G>A  g.7578406C>T NC_000017.9:g.7519131C>T           G                   A
4         G>A      c.524G>A  g.7578406C>T NC_000017.9:g.7519131C>T           G                   A
5         G>T      c.527G>T  g.7578403C>A NC_000017.9:g.7519128C>A           G                   T
6         G>A      c.818G>A  g.7577120C>T NC_000017.9:g.7517845C>T           G                   A
  Splice_site CpG_site           Type Mut_rate WT_codon Mutant_codon WT_AA Mutant_AA ProtDescription
1          no       no        A:T>G:C    0.170      ACC          GCC   Thr       Ala         p.T155A
2          no      yes G:C>A:T at CpG    1.243      CCG          CTG   Pro       Leu         p.P152L
3          no      yes G:C>A:T at CpG    1.280      CGC          CAC   Arg       His         p.R175H
4          no      yes G:C>A:T at CpG    1.280      CGC          CAC   Arg       His         p.R175H
5          no       no        G:C>T:A    0.054      TGC          TTC   Cys       Phe         p.C176F
6          no      yes G:C>A:T at CpG    1.335      CGT          CAT   Arg       His         p.R273H
  Mut_rateAA   Effect Structural_motif Putative_stop Sample_Name Sample_ID Sample_source Tumor_origin Grade
1      0.170 missense NDBL/beta-sheets             0    CAS91-19        17       surgery      primary      
2      1.243 missense NDBL/beta-sheets             0     CAS91-4        14       surgery      primary      
3      1.280 missense            L2/L3             0    CAS91-13        12       surgery      primary      
4      1.280 missense            L2/L3             0     CAS91-5        15       surgery      primary      
5      0.054 missense            L2/L3             0     CAS91-1        16       surgery      primary      
6      1.335 missense          L1/S/H2             0     CAS91-3        13       surgery      primary      
  Stage TNM p53_IHC KRAS_status Other_mutations Other_associations
1              <NA>        <NA>            <NA>                   
2              <NA>        <NA>            <NA>                   
3              <NA>        <NA>            <NA>                   
4              <NA>        <NA>            <NA>                   
5              <NA>        <NA>            <NA>                   
6              <NA>        <NA>            <NA>                   
                                                                 Add_Info Individual_ID  Sex Age Ethnicity
1 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            17 <NA>  NA          
2 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            14 <NA>  NA          
3 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            12 <NA>  NA          
4 Mutation only present in adjacent dysplastic area (Barrett's esophagus)            15 <NA>  NA          
5                                                                                    16 <NA>  NA          
6      Mutation absent from adjacent dysplasia area (Barrett's esophagus)            13 <NA>  NA          
  Geo_area Country            Development       Population   Region TP53polymorphism Germline_mutation
1              USA More developed regions Northern America Americas                                 NA
2              USA More developed regions Northern America Americas                                 NA
3              USA More developed regions Northern America Americas                                 NA
4              USA More developed regions Northern America Americas                                 NA
5              USA More developed regions Northern America Americas                                 NA
6              USA More developed regions Northern America Americas                                 NA
  Family_history Tobacco Alcohol Exposure Infectious_agent Ref_ID Cross_Ref_ID  PubMed Exclude_analysis
1                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
2                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
3                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
4                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
5                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
6                   <NA>    <NA>     <NA>             <NA>      4           NA 1868473            False
  WGS_WXS
1      No
2      No
3      No
4      No
5      No
6      No

无论如何,我想创建一个简单的世界地图,为研究过这种突变的国家着色,如果或多或少“突变签名”来自这些国家。

如果您看到以下内容,您可能会更好地理解我想要做什么:

summary(ARCTP53_SOExample$Country)
Australia                  Brazil                  Canada                   China 
                      1                     127                      76                     519 
       China, Hong-Kong Chinese Taipei (Taiwan)          Czech Republic                   Egypt 
                     52                      36                       9                       9 
                 France                 Germany                   India                    Iran 
                    195                      10                      63                     112 
                Ireland                   Italy                   Japan                   Kenya 
                     25                      30                     414                      11 
           South Africa                   Spain             Switzerland                Thailand 
                     13                       2                      24                      35 
        The Netherlands                      UK                 Uruguay                     USA 
                      6                      17                       6                     189 
                   NA's 
                     30 

因此,有些国家/地区在我的 data.frame 中出现多次。

这就是我所做的,希望得到我想要的 map :

library(ggplot2)
library(maps)
world_map<-map_data("world")
ggplot(ARCTP53_SOExample)+geom_map(map = world_map, aes(map_id = Country,fill = Country),
+ colour = "black") +
+ expand_limits(x = world_map$long, y = world_map$lat)

这就是我得到的: This map only contains the countries in my list...

有人对我做错的事情有任何意见吗?

此外,我接下来想做的是将 ExonIntron 列的 geom_bar() 添加到不同的国家/地区。但是,我想先尝试生成正确的 map ?

谢谢工厂。

最佳答案

ARC… 数据框中缺失的国家== map 上缺失的区域,可以通过由 world_map 数据框制作的基础图层进行补偿:

library(maps)

world_map<-map_data("world")

gg <- ggplot(ARCTP53_SOExample)

# need one layer with ALL THE THINGS (well, all the regions)
gg <- gg + geom_map(dat=world_map, map = world_map, 
                    aes(map_id=region), fill="white", color="black")

# now we can put the layer we really want
gg <- gg + geom_map(map = world_map, 
                    aes(map_id = Country, fill = Country), colour = "black")

gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg

map1

我删除了图例,因为使用分区统计图有点假设人们了解地理。

注意:每个地区(国家)使用不同的颜色确实不是一个好主意。由于您确实只是想突出显示已研究突变的位置,因此单一颜色就足够了:

gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country), 
                    fill = "steelblue", colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg

map2

由于您最终想要讲述 ExonIntron 的故事,因此您可能需要考虑使用它作为分区图的颜色。我对基因一无所知,所以我不知道渐变是否有意义,或者不同的颜色是否是正确的选择。我假设,以下代码创建的大量不同颜色使我认为您可能希望为 intron 做一个渐变比例,为 extron 做一个渐变比例。再说一次,我不是一个基因人。

gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron), 
                    colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg

map3

某些颜色要么位于非常小的区域中,要么位于其名称与 world_map$region 中的名称不匹配的区域中。您可能想看一下。这:

wm.reg <- unique(as.character(world_map$region))
arc.reg <- unique(as.character(ARCTP53_SOExample$Country))

arc.reg %in% wm.reg
##  [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
## [14]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE

有点表明有些内容丢失了。

如果您使用图例而不是构建自己的结果表,您可能还需要考虑以不同的方式制作图例(即将其放在底部)。

更新

我差点忘了。由于您(很可能)不需要南极洲,因此您应该摆脱它,因为它占用了相当多的宝贵空间:

world_map <- subset(world_map, region!="Antarctica")

gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron), 
                    colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg

map4

(注意:我放弃了图例,因为我真的认为您应该重新考虑 map 上的颜色,然后使用额外的表格或绘图来充当图例)

<小时/>

最终更新(根据下面评论中的OP请求)

library(ggplot2)
library(maps)
library(plyr)
library(gridExtra)

ARCTP53_SOExample <- read.csv("dat.csv")

# reduce all the distinct exon/introns to just exon or intron

ARCTP53_SOExample$EorI <- factor(ifelse(grepl("exon", 
                                              ARCTP53_SOExample$ExonIntron, 
                                              ignore.case = TRUE), 
                                        "exon", "intron"))

# extract summary data for the two variables we care about for the map

arc.combined <- count(ARCTP53_SOExample, .(Country, EorI))
colnames(arc.combined) <- c("region", "EorI", "ei.ct")

# get total for country (region) and add to the summary info

arc.combined <- merge(arc.combined, count(arc.combined, .(region), wt_var=.(ei.ct)))
colnames(arc.combined) <- c("region", "EorI", "ei.ct", "region.total")

# it wasn't specified if the "EorI" is going to be used on the map so 
# we won't use it below (but we could, now)

# get map and intercourse Antarctica

world_map <- map_data("world")
world_map <- subset(world_map, region!="Antarctica")

# this will show the counts by country with all of the "chart junk" removed
# and the "counts" scaled as a gradient, and with the legend at the top

gg <- ggplot(arc.combined)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = region.total), size=0.25)
gg <- gg + scale_fill_gradient(low="#fff7bc", high="#cc4c02", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg

mapb

# BUT you might want to show the counts by intron/exon by country
# SO we do a separate map for each factor and combine them
# with some grid magic. This provides more granular control over
# each choropleth (in the event one wanted to tweak one or the other)

# exon

gg <- ggplot(arc.combined[arc.combined$EorI=="exon",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), size=0.25)
gg <- gg + scale_fill_gradient(low="#f7fcb9", high="#238443", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'exon' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.exon <- gg

# intron

gg <- ggplot(arc.combined[arc.combined$EorI=="intron",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region), 
                    fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), 
                    colour = "#7f7f7f", size=0.25)
gg <- gg + scale_fill_gradient(low="#ece7f2", high="#0570b0", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'intron' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.intron <- gg

# use some grid magic to combine them into one plot

grid.arrange(gg.exon, gg.intron, ncol=1)

mapb

关于r - 如何让 geom_map 显示 map 的所有部分?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22855197/

相关文章:

r - 使用dendextend在R中绘制缠结子图

r - 如何在 ggplot 函数中使用以数字开头的列名

python - knitr:Python 的代码外部化

r - 有条件地将列中的子字符串值替换为其他列的子字符串

r - 使用 R 在一张图中绘制多个列表

python-3.x - 替代部分依赖图?

r - 根据方面的存在有条件地修改ggplot主题?

r - 使用 ggplot2 在折线图中可视化 ..count..

r - 剪切数据和访问组以绘制百分位线

r - 堆叠条形图,每个堆叠独立的填充顺序