r - 将子字符串中的字符分配给facet_grid labeller 的原始字符串

对于这里缺乏适当的术语，我提前表示歉意，这在我的标题中最为明显。我是自学成才的，我的基本 R 技能源于需要实现其他人的生物学研究代码。请在适用的地方更正。

因此，要设置一个工作示例，请使用钻石:

library(ggplot2)
data(diamonds)

diamonds <- diamonds[sample(nrow(diamonds), 1000), ]
diamonds$cut <- factor(diamonds$cut,levels = c("Ideal", "Very Good", "Fair", "Good", "Premium"))

p <- ggplot(diamonds, aes(carat, ..density..)) +
    geom_histogram(binwidth = 1)
p + facet_grid(. ~ cut)

基本上，当我使用自己的数据时，网格的每个方面的名称都太长，因此我想在不更改数据的情况下指定较短的名称。

我发现一篇文章说我可以像这样重新分配名称:

LAB_NAMES<-list('Ideal'="I", 'Very Good' = "V",
               'Fair'="F",'Good' = "G",
               'Premium'="P")
NEW_LABELLER<-function(variable,value){return(LAB_NAMES[value])}

然后将贴标器添加到facet_grid

p <- ggplot(diamonds, aes(carat, ..density..)) +
    geom_histogram(binwidth = 1)
p + facet_grid(. ~ cut,labeller=NEW_LABELLER)

这一次很好，但是我正在生成一个新的名称列表(例如“hsa-miR-4640-5p_hsa-mir-4640”“hsa-miR-548ap-5p_hsa-mir-548ap”.. 。ETC) 每次我在实验中观察新的条件时。您可以看到名称很长，但中间包含一个常见的“_”。因此，我可以使用 sub 来获取我想要的名称部分，例如，使用钻石我们会做类似的事情:

NAMES<-c("Ideal", "Very Good", "Fair", "Good", "Premium")
SHORT_NAMES<-substr(NAMES, 1, 1)

但是手动将这些(相对较短的)名称放回到贴标机的列表中，既缓慢又乏味。

问题:是否有一种优雅的方法可以将短标签的子字符串分配给旧的长标签字符串，以概括我在下面将它们归因的方式，一举完成？

LAB_NAMES<-list('Ideal'="I", 'Very Good' = "V",
               'Fair'="F",'Good' = "G",
               'Premium'="P")

提前谢谢大家。再次感谢 SO 的定期和耐心贡献者。如果我能完成这个该死的博士学位，我应该感谢你。

更新 - 我在对象 sig_miRs 中生成的长名称示例:

>sig_miRs()
[1] "hsa-miR-10b-5p_hsa-mir-10b", "hsa-miR-143-3p_hsa-mir-143",
                   "hsa-miR-146b-5p_hsa-mir-146b","hsa-miR-150-5p_hsa-mir-150",
                   "hsa-miR-196a-3p_hsa-mir-196a-2","hsa-miR-199a-3p_hsa-mir-199a-2",
                   "hsa-miR-199b-3p_hsa-mir-199b","hsa-miR-23c_hsa-mir-23c",
                   "hsa-miR-4326_hsa-mir-4326","hsa-miR-4485-3p_hsa-mir-4485",
                   "hsa-miR-668-3p_hsa-mir-668","hsa-miR-6840-5p_hsa-mir-6840"

我的问题的解决方案应该采用上面的列表并优雅地概括这一点:

sig_miRs_short<-list('hsa-miR-10b-5p_hsa-mir-10b'="hsa-miR-10b-5p", 'hsa-miR-143-3p_hsa-mir-143' = "hsa-miR-143-3p",
                   'hsa-miR-146b-5p_hsa-mir-146b'="hsa-miR-146b-5p",'hsa-miR-150-5p_hsa-mir-150' = "hsa-miR-150-5p",
                   'hsa-miR-196a-3p_hsa-mir-196a-2'="hsa-miR-196a-3p",'hsa-miR-199a-3p_hsa-mir-199a-2'="hsa-miR-199a-3p",
                   'hsa-miR-199b-3p_hsa-mir-199b'="hsa-miR-199b-3p",'hsa-miR-23c_hsa-mir-23c'="hsa-miR-23c",
                   'hsa-miR-4326_hsa-mir-4326'="hsa-miR-4326",'hsa-miR-4485-3p_hsa-mir-4485'="hsa-miR-4485-3p",
                   'hsa-miR-668-3p_hsa-mir-668'="hsa-miR-668-3p",'hsa-miR-6840-5p_hsa-mir-6840'="hsa-miR-6840-5p")
    sig_miR_labeller<-function(variable,value){return(sig_miRs_short[value])}

最佳答案

由于您只对长名称中下划线之前的部分感兴趣，因此有多种方法可以访问它。

选项 1:使用正则表达式。此贴标器将下划线(以及下划线)之后的字符串的每个部分替换为空字符串。

sig_miR_labeller2 <- function(variable, value){
  return(gsub("_.+","",value))
}

编辑:以下是如何使用贴标机(和另一个选项)

#making some testdata, sampling from the long names
set.seed(123)
nobs=500

sig_miRs_short<-list('hsa-miR-10b-5p_hsa-mir-10b'="hsa-miR-10b-5p", 'hsa-miR-143-3p_hsa-mir-143' = "hsa-miR-143-3p",
                     'hsa-miR-146b-5p_hsa-mir-146b'="hsa-miR-146b-5p",'hsa-miR-150-5p_hsa-mir-150' = "hsa-miR-150-5p",
                     'hsa-miR-196a-3p_hsa-mir-196a-2'="hsa-miR-196a-3p",'hsa-miR-199a-3p_hsa-mir-199a-2'="hsa-miR-199a-3p",
                     'hsa-miR-199b-3p_hsa-mir-199b'="hsa-miR-199b-3p",'hsa-miR-23c_hsa-mir-23c'="hsa-miR-23c",
                     'hsa-miR-4326_hsa-mir-4326'="hsa-miR-4326",'hsa-miR-4485-3p_hsa-mir-4485'="hsa-miR-4485-3p",
                     'hsa-miR-668-3p_hsa-mir-668'="hsa-miR-668-3p",'hsa-miR-6840-5p_hsa-mir-6840'="hsa-miR-6840-5p")

testnames <- names(sig_miRs_short)
testdata <- data.frame(x=runif(nobs),y=runif(nobs),miR=sample(testnames,nobs,T))

方法一:使用贴标机功能。它吸收了你的长弦并删除下划线及其后面的所有内容。

sig_miR_labeller <- function(variable, value){
  return(gsub("_.+","",value))
}

p1 <- ggplot(testdata, aes(x=x,y=y))+
  geom_point() +
  facet_grid(.~miR, labeller=sig_miR_labeller)

方法 2:不使用贴标机，而是在数据中创建“prettyvar”并使用它来进行分面(如果您想使用facet_wrap代替，这可能是实用的，因为这不需要标签器参数) testdata$pretty_miR <- gsub("_.+","",testdata$miR)

p2 <- ggplot(testdata, aes(x=x,y=y))+
  geom_point()+
  facet_grid(.~pretty_miR)

两者都会导致:

关于r - 将子字符串中的字符分配给facet_grid labeller 的原始字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/33794912/

r - 将子字符串中的字符分配给facet_grid labeller 的原始字符串

上一篇：react-native - 如何在react-native中引用自定义组件？

下一篇：unit-testing - 使用 System.js 处理 css 导入的 React 组件的 Mocha 测试