docker - 如何将snakemake容器用于htslib(bgzip + tabix)

标签 docker snakemake singularity-container

我有一个使用全局奇点图像和基于规则的conda包装器的管道。
但是,某些工具没有包装器(即htslibbgziptabix)。
现在,我需要学习如何run jobs in containers
在官方文档链接中说:

"Allowed image urls entail everything supported by singularity (e.g., shub:// and docker://)."


现在,我尝试了以下来自奇点集线器的图像,但出现错误:
最小的可复制示例:config.yaml
# Files
REF_GENOME: "c_elegans.PRJNA13758.WS265.genomic.fa"
GENOME_ANNOTATION: "c_elegans.PRJNA13758.WS265.annotations.gff3"
Snakefile
# Directories------------------------------------------------------------------
configfile: "config.yaml"

# Setting the names of all directories
dir_list = ["REF_DIR", "LOG_DIR", "BENCHMARK_DIR", "QC_DIR", "TRIM_DIR", "ALIGN_DIR", "MARKDUP_DIR", "CALLING_DIR", "ANNOT_DIR"]
dir_names = ["refs", "logs", "benchmarks", "qc", "trimming", "alignment", "mark_duplicates", "variant_calling", "annotation"]
dirs_dict = dict(zip(dir_list, dir_names))

GENOME_INDEX=config["REF_GENOME"]+".fai"
VEP_ANNOT=config["GENOME_ANNOTATION"]+".gz"
VEP_ANNOT_INDEX=config["GENOME_ANNOTATION"]+".gz.tbi"

# Singularity with conda wrappers

singularity: "docker://continuumio/miniconda3:4.5.11"

# Rules -----------------------------------------------------------------------

rule all:
    input:
    expand('{REF_DIR}/{GENOME_ANNOTATION}{ext}', REF_DIR=dirs_dict["REF_DIR"], GENOME_ANNOTATION=config["GENOME_ANNOTATION"], ext=['', '.gz', '.gz.tbi']),
        expand('{REF_DIR}/{REF_GENOME}{ext}', REF_DIR=dirs_dict["REF_DIR"], REF_GENOME=config["REF_GENOME"], ext=['','.fai']),

rule download_references:
    params:
    ref_genome=config["REF_GENOME"],
        genome_annotation=config["GENOME_ANNOTATION"],
        ref_dir=dirs_dict["REF_DIR"]
    output:
    os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"]),
        os.path.join(dirs_dict["REF_DIR"],config["GENOME_ANNOTATION"]),
        os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT),
        os.path.join(dirs_dict["REF_DIR"],VEP_ANNOT_INDEX)
    resources:
    mem=80000,
        time=45
    log:
        os.path.join(dirs_dict["LOG_DIR"],"references","download.log")
    singularity:
        "shub://biocontainers/tabix"
    shell: """
    cd {params.ref_dir}
        wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.genomic.fa.gz
        bgzip -d {params.ref_genome}.gz
        wget ftp://ftp.wormbase.org/pub/wormbase/releases/WS265/species/c_elegans/PRJNA13758/c_elegans.PRJNA13758.WS265.annotations.gff3.gz
        bgzip -d {params.genome_annotation}.gz
        grep -v "#" {params.genome_annotation} | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > {params.genome_annotation}.gz
        tabix -p gff {params.genome_annotation}.gz
        """


rule index_reference:
    input:
    os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
    output:
    os.path.join(dirs_dict["REF_DIR"],GENOME_INDEX)
    resources:
    mem=2000,
        time=30,
    log:
        os.path.join(dirs_dict["LOG_DIR"],"references", "faidx_index.log")
    wrapper:
    "0.64.0/bio/samtools/faidx"
错误
Building DAG of jobs...
Pulling singularity image shub://biocontainers/tabix.
WorkflowError:
Failed to pull singularity image from shub://biocontainers/tabix:
ESC[31mFATAL:  ESC[0m While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub

  File "/home/moldach/anaconda3/envs/snakemake/lib/python3.7/site-packages/snakemake/deployment/singularity.py", line 88, in pull
~
看来这是容器问题吗?
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull shub://biocontainers/tabix
FATAL:   While pulling shub image: failed to get manifest for: shub://biocontainers/tabix: the requested manifest was not found in singularity hub
实际上,我在其他biocontainers容器中遇到了此问题。
例如,我还需要使用一个容器来进行bowtie2索引,这是我从biocontainers/bowtie2与同一个工具comics/bowtie2的另一个开发人员容器得到的错误:
^C(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://biocontainers/bowtie2
FATAL:   While making image from oci registry: failed to get checksum for docker://biocontainers/bowtie2: Error reading manifest latest in docker.io/biocontainers/bowtie2: manifest unknown: manifest unknown
(snakemake) [moldach@arc CONTAINER_TROUBLESHOOT]$ singularity pull docker://comics/bowtie2
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
Getting image source signatures
Copying blob a02a4930cb5d done

有人知道为什么吗?

最佳答案

生物容器不允许将latest用作其容器的标签,因此您将需要指定要使用的标签。
从他们的doc:

The BioContainers community had decided to remove the latest tag. Then, the following command docker pull biocontainers/crux will fail. Read more about this decision in Getting started with Docker


如果未指定标签,则默认为latest标签,当然这里不允许这样做。有关bowtie2的标签,请参见here。这样的用法将起作用:
singularity pull docker://biocontainers/bowtie2:v2.4.1_cv1

关于docker - 如何将snakemake容器用于htslib(bgzip + tabix),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64050974/

相关文章:

singularity-container - build 奇点容器需要多大的空间?

docker - 为什么在运行docker tomcat之后看不到Tomcat?

python - 为什么在 snakefile 中设置 `wildcard_constraints` 会阻止删除标记为 `temp` 的文件?

python - Snakemake - 在调用外部脚本之前加载集群模块

docker - 从本地私有(private) docker 注册表创建 Singularity 容器

R 用矩形而不是文本绘制绘图

docker - "docker-compose down"如何知道哪些内容应该被删除?

postgresql - postgres 和 docker-compose : can't create a custom role and database

docker - 向 Juypterhub DockerSpawner 添加额外的主机

r - 在snakemake R脚本中循环遍历列表的问题