我刚刚开始使用 ggplot2
中的 geom_map
函数。阅读完我在 geom_map
上找到的 29 篇文章后,我仍然遇到同样的问题。
我的数据框大得离谱,包含 2000 多行。它基本上是由世界卫生组织汇编的特定基因(TP53)的数据。
请从here下载.
标题如下所示:
> head(ARCTP53_SOExample)
Mutation_ID MUT_ID hg18_Chr17_coordinates hg19_Chr17_coordinates ExonIntron Genomic_nt Codon_number
1 16 1789 7519192 7578467 5-exon 12451 155
2 13 1741 7519200 7578475 5-exon 12443 152
3 17 2143 7519131 7578406 5-exon 12512 175
4 14 2143 7519131 7578406 5-exon 12512 175
5 15 2168 7519128 7578403 5-exon 12515 176
6 12 3737 7517845 7577120 8-exon 13798 273
Description c_description g_description g_description_hg18 WT_nucleotide Mutant_nucleotide
1 A>G c.463A>G g.7578467T>C NC_000017.9:g.7519192T>C A G
2 C>T c.455C>T g.7578475G>A NC_000017.9:g.7519200G>A C T
3 G>A c.524G>A g.7578406C>T NC_000017.9:g.7519131C>T G A
4 G>A c.524G>A g.7578406C>T NC_000017.9:g.7519131C>T G A
5 G>T c.527G>T g.7578403C>A NC_000017.9:g.7519128C>A G T
6 G>A c.818G>A g.7577120C>T NC_000017.9:g.7517845C>T G A
Splice_site CpG_site Type Mut_rate WT_codon Mutant_codon WT_AA Mutant_AA ProtDescription
1 no no A:T>G:C 0.170 ACC GCC Thr Ala p.T155A
2 no yes G:C>A:T at CpG 1.243 CCG CTG Pro Leu p.P152L
3 no yes G:C>A:T at CpG 1.280 CGC CAC Arg His p.R175H
4 no yes G:C>A:T at CpG 1.280 CGC CAC Arg His p.R175H
5 no no G:C>T:A 0.054 TGC TTC Cys Phe p.C176F
6 no yes G:C>A:T at CpG 1.335 CGT CAT Arg His p.R273H
Mut_rateAA Effect Structural_motif Putative_stop Sample_Name Sample_ID Sample_source Tumor_origin Grade
1 0.170 missense NDBL/beta-sheets 0 CAS91-19 17 surgery primary
2 1.243 missense NDBL/beta-sheets 0 CAS91-4 14 surgery primary
3 1.280 missense L2/L3 0 CAS91-13 12 surgery primary
4 1.280 missense L2/L3 0 CAS91-5 15 surgery primary
5 0.054 missense L2/L3 0 CAS91-1 16 surgery primary
6 1.335 missense L1/S/H2 0 CAS91-3 13 surgery primary
Stage TNM p53_IHC KRAS_status Other_mutations Other_associations
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> <NA> <NA>
5 <NA> <NA> <NA>
6 <NA> <NA> <NA>
Add_Info Individual_ID Sex Age Ethnicity
1 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 17 <NA> NA
2 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 14 <NA> NA
3 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 12 <NA> NA
4 Mutation only present in adjacent dysplastic area (Barrett's esophagus) 15 <NA> NA
5 16 <NA> NA
6 Mutation absent from adjacent dysplasia area (Barrett's esophagus) 13 <NA> NA
Geo_area Country Development Population Region TP53polymorphism Germline_mutation
1 USA More developed regions Northern America Americas NA
2 USA More developed regions Northern America Americas NA
3 USA More developed regions Northern America Americas NA
4 USA More developed regions Northern America Americas NA
5 USA More developed regions Northern America Americas NA
6 USA More developed regions Northern America Americas NA
Family_history Tobacco Alcohol Exposure Infectious_agent Ref_ID Cross_Ref_ID PubMed Exclude_analysis
1 <NA> <NA> <NA> <NA> 4 NA 1868473 False
2 <NA> <NA> <NA> <NA> 4 NA 1868473 False
3 <NA> <NA> <NA> <NA> 4 NA 1868473 False
4 <NA> <NA> <NA> <NA> 4 NA 1868473 False
5 <NA> <NA> <NA> <NA> 4 NA 1868473 False
6 <NA> <NA> <NA> <NA> 4 NA 1868473 False
WGS_WXS
1 No
2 No
3 No
4 No
5 No
6 No
无论如何,我想创建一个简单的世界地图,为研究过这种突变的国家着色,如果或多或少“突变签名”来自这些国家。
如果您看到以下内容,您可能会更好地理解我想要做什么:
summary(ARCTP53_SOExample$Country)
Australia Brazil Canada China
1 127 76 519
China, Hong-Kong Chinese Taipei (Taiwan) Czech Republic Egypt
52 36 9 9
France Germany India Iran
195 10 63 112
Ireland Italy Japan Kenya
25 30 414 11
South Africa Spain Switzerland Thailand
13 2 24 35
The Netherlands UK Uruguay USA
6 17 6 189
NA's
30
因此,有些国家/地区在我的 data.frame
中出现多次。
这就是我所做的,希望得到我想要的 map :
library(ggplot2)
library(maps)
world_map<-map_data("world")
ggplot(ARCTP53_SOExample)+geom_map(map = world_map, aes(map_id = Country,fill = Country),
+ colour = "black") +
+ expand_limits(x = world_map$long, y = world_map$lat)
这就是我得到的:
有人对我做错的事情有任何意见吗?
此外,我接下来想做的是将 ExonIntron
列的 geom_bar()
添加到不同的国家/地区。但是,我想先尝试生成正确的 map ?
谢谢工厂。
最佳答案
ARC…
数据框中缺失的国家== map 上缺失的区域,可以通过由 world_map
数据框制作的基础图层进行补偿:
library(maps)
world_map<-map_data("world")
gg <- ggplot(ARCTP53_SOExample)
# need one layer with ALL THE THINGS (well, all the regions)
gg <- gg + geom_map(dat=world_map, map = world_map,
aes(map_id=region), fill="white", color="black")
# now we can put the layer we really want
gg <- gg + geom_map(map = world_map,
aes(map_id = Country, fill = Country), colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg
我删除了图例,因为使用分区统计图有点假设人们了解地理。
注意:每个地区(国家)使用不同的颜色确实不是一个好主意。由于您确实只是想突出显示已研究突变的位置,因此单一颜色就足够了:
gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country),
fill = "steelblue", colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg
由于您最终想要讲述 ExonIntron
的故事,因此您可能需要考虑使用它作为分区图的颜色。我对基因一无所知,所以我不知道渐变是否有意义,或者不同的颜色是否是正确的选择。我将假设,以下代码创建的大量不同颜色使我认为您可能希望为 intron
做一个渐变比例,为 extron 做一个渐变比例
。再说一次,我不是一个基因人。
gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron),
colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg
某些颜色要么位于非常小的区域中,要么位于其名称与 world_map$region
中的名称不匹配的区域中。您可能想看一下。这:
wm.reg <- unique(as.character(world_map$region))
arc.reg <- unique(as.character(ARCTP53_SOExample$Country))
arc.reg %in% wm.reg
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
## [14] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
有点表明有些内容丢失了。
如果您使用图例而不是构建自己的结果表,您可能还需要考虑以不同的方式制作图例(即将其放在底部)。
更新
我差点忘了。由于您(很可能)不需要南极洲,因此您应该摆脱它,因为它占用了相当多的宝贵空间:
world_map <- subset(world_map, region!="Antarctica")
gg <- ggplot(ARCTP53_SOExample)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="black")
gg <- gg + geom_map(map = world_map, aes(map_id = Country, fill = ExonIntron),
colour = "black")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + theme(legend.position="none")
gg
(注意:我放弃了图例,因为我真的认为您应该重新考虑 map 上的颜色,然后使用额外的表格或绘图来充当图例)
<小时/>最终更新(根据下面评论中的OP请求)
library(ggplot2)
library(maps)
library(plyr)
library(gridExtra)
ARCTP53_SOExample <- read.csv("dat.csv")
# reduce all the distinct exon/introns to just exon or intron
ARCTP53_SOExample$EorI <- factor(ifelse(grepl("exon",
ARCTP53_SOExample$ExonIntron,
ignore.case = TRUE),
"exon", "intron"))
# extract summary data for the two variables we care about for the map
arc.combined <- count(ARCTP53_SOExample, .(Country, EorI))
colnames(arc.combined) <- c("region", "EorI", "ei.ct")
# get total for country (region) and add to the summary info
arc.combined <- merge(arc.combined, count(arc.combined, .(region), wt_var=.(ei.ct)))
colnames(arc.combined) <- c("region", "EorI", "ei.ct", "region.total")
# it wasn't specified if the "EorI" is going to be used on the map so
# we won't use it below (but we could, now)
# get map and intercourse Antarctica
world_map <- map_data("world")
world_map <- subset(world_map, region!="Antarctica")
# this will show the counts by country with all of the "chart junk" removed
# and the "counts" scaled as a gradient, and with the legend at the top
gg <- ggplot(arc.combined)
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = region.total), size=0.25)
gg <- gg + scale_fill_gradient(low="#fff7bc", high="#cc4c02", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg
# BUT you might want to show the counts by intron/exon by country
# SO we do a separate map for each factor and combine them
# with some grid magic. This provides more granular control over
# each choropleth (in the event one wanted to tweak one or the other)
# exon
gg <- ggplot(arc.combined[arc.combined$EorI=="exon",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct), size=0.25)
gg <- gg + scale_fill_gradient(low="#f7fcb9", high="#238443", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'exon' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.exon <- gg
# intron
gg <- ggplot(arc.combined[arc.combined$EorI=="intron",])
gg <- gg + geom_map(dat=world_map, map = world_map, aes(map_id=region),
fill="white", color="#7f7f7f", size=0.25)
gg <- gg + geom_map(map = world_map, aes(map_id = region, fill = ei.ct),
colour = "#7f7f7f", size=0.25)
gg <- gg + scale_fill_gradient(low="#ece7f2", high="#0570b0", name="Tumor counts")
gg <- gg + expand_limits(x = world_map$long, y = world_map$lat)
gg <- gg + labs(x="", y="", title="Tumor contribution by 'intron' & country")
gg <- gg + theme(panel.grid=element_blank(), panel.border=element_blank())
gg <- gg + theme(axis.ticks=element_blank(), axis.text=element_blank())
gg <- gg + theme(legend.position="top")
gg.intron <- gg
# use some grid magic to combine them into one plot
grid.arrange(gg.exon, gg.intron, ncol=1)
关于r - 如何让 geom_map 显示 map 的所有部分?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22855197/