python-3.x - Snakemake 在干运行时产生严重不连贯的错误

过去一年我一直致力于在 SLURM 管理的集群上执行的项目。现在我们想让我们的结果可重复，为此，我们将其移植到snakemake。然而，我是从头开始学习的，这让我很头疼。

下面是我的代码:

# module load python/3.7
# python -m venv ./venv
# source ./venv/bin/activate
# pip install snakemake
# snakemake --version

configfile: "config.yaml"
#localrules:
vcf=config["vcf"],
ncbiFiles=config["ncbiFiles"]
LC=config["leafcutter"]


rule all:
    input: ".prepare_phen_table.chkpnt"

rule filter_vcf:
    input:
        expand("{vcf}", vcf=config["vcf"]),
        expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    shell:
        expand("sbatch --wait --export=vcf={vcf},outdir=$PWD src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh", vcf=config["vcf"])

rule index_vcf:
    input:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
        expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])
    shell:
        expand("sbatch --export=outdir=$PWD src/sqtl_mapping/primary/sh/00b_index_vcf.sh;"
        "touch {ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])

rule junc_cluster:
    input:
        expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
    output:
        ".junc_cluster.chkpnt"
    shell:
        "sbatch --wait src/sqtl_mapping/sh/01_junc_cluster.sh;"
        "touch .junc_cluster.chkpnt"

rule intron_clustering:
    input:
        ".junc_cluster.chkpnt"
    output:
        ".intron_clustering.chkpnt"
    shell:
        expand("sbatch --wait src/sqtl_mapping/sh/02_intronclustering.sh {LC};"
        "touch .intron_clustering.chkpnt;"
        "cd intronclustering/", LC=config["leafcutter"])

rule prepare_phen_table:
    input:
        LC,
        ".intron_clustering.chkpnt"
    output:
        ".prepare_phen_table.chkpnt"
    shell:
        expand("sbatch --wait src/sqtl_mapping/sh/03_prepare_phen_table.sh {LC};"
        "touch .prepare_phen_table.chkpnt",LC=config["leafcutter"])

请假设config.yaml没问题。当我调用 snakemake -n 时，出现以下错误:

(venv) [<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8decfee8f4e8e9e4bfcde7e5f8a3e8e9f8" rel="noreferrer noopener nofollow">[email protected]</a>@rmccoy22-dev neand_sQTL]$ snakemake -n
Building DAG of jobs...
Job counts:
    count   jobs
    1   all
    1   index_vcf
    1   intron_clustering
    1   junc_cluster
    1   prepare_phen_table
    5

[Fri Aug 16 12:31:53 2019]
rule index_vcf:
    input: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz
    output: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi, /scratch/groups/rmccoy22/Ne_sQTL/files/.index_vcf.chkpnt
    jobid: 4

Traceback (most recent call last):
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/__init__.py", line 547, in snakemake
    export_cwl=export_cwl)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/workflow.py", line 674, in execute
    success = scheduler.schedule()
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 278, in schedule
    self.run(job)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 294, in run
    error_callback=self._error)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 75, in run
    self._run(job)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 86, in _run
    self.printjob(job)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 92, in printjob
    job.log_info(skip_dynamic=True)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 825, in log_info
    logger.shellcmd(self.shellcmd, indent=indent)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 323, in shellcmd
    self.rule.shellcmd else None)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 732, in format_wildcards
    return format(string, **_variables)
  File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/utils.py", line 378, in format
    return fmt.format(_pattern, *args, **variables)
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 186, in format
    return self.vformat(format_string, args, kwargs)
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 190, in vformat
    result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 200, in _vformat
    self.parse(format_string):
  File "/software/apps/python/3.7/lib/python3.7/string.py", line 284, in parse
    return _string.formatter_parser(format_string)
TypeError: expected str, got list

除了可能存在某种错误或与我正在尝试做的事情不兼容之外，我不知道该怎么做。

感谢您为我提供的任何帮助。

最佳答案

好吧，我决定冒昧地提供一些风格的反馈:)。这就是我要做的(我当然无法测试它):

configfile: "config.yaml"


rule all:
    input: 
        ".prepare_phen_table.chkpnt"


rule filter_vcf:
    input:
        expand("{vcf}", vcf=config["vcf"]),
        expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    shell:
        "src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh"


rule index_vcf:
    input:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
    output:
        expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
        touch(expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"]))
    shell:
        "src/sqtl_mapping/primary/sh/00b_index_vcf.sh"


rule junc_cluster:
    input:
        expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
    output:
        touch(".junc_cluster.chkpnt")
    shell:
        "src/sqtl_mapping/sh/01_junc_cluster.sh"


rule intron_clustering:
    input:
        ".junc_cluster.chkpnt"
    output:
        touch(".intron_clustering.chkpnt")
    params:
        LC=config["leafcutter"]
    shell:
        "src/sqtl_mapping/sh/02_intronclustering.sh {params.LC}"


rule prepare_phen_table:
    input:
        config["leafcutter"],
        ".intron_clustering.chkpnt"
    output:
        touch(".prepare_phen_table.chkpnt")
    params:
        LC=config["leafcutter"]
    shell:
        "src/sqtl_mapping/sh/03_prepare_phen_table.sh {params.LC}"

最重要的是，这应该可以解决您的问题:snakemake 提示它需要一个字符串，但得到一个列表。 Expand 函数返回所有组合的列表，在您的情况下是一个长列表(但仍然是列表而不是字符串)。我只是将 LC 存储在参数中，而不是尝试通过调用 Expand 来填充其值。

其次，我将你的检查点触摸更改为很酷的 in-built function的蛇形。

最后但并非最不重要的一点是，我删除了你所有的 sbatch 调用。 Snakemake的想法是它可以在任何平台上执行。本地计算机和 super 集群等。如果您对 sbatch 命令进行硬编码，它将仅适用于 slurm 集群。如果您想在本地计算机上执行代码，只需键入

snakemake ...[params]

如果你想在 slurm 集群上运行它，你需要改变的是

snakemake ...[params] --cluster "sbatch"

snakemake 会将 sbatch 命令添加到所有函数调用之前。简单的!再看看cluster docs .

请务必再次查看snakemake文档和教程，以便了解snakemake的思想/范式。也许我的更改可以解决您 90% 的问题，但最终需要您进行一些小修改。祝你好运!

关于python-3.x - Snakemake 在干运行时产生严重不连贯的错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57528586/

python-3.x - Snakemake 在干运行时产生严重不连贯的错误

上一篇：r - 如何最好地在 R 的数据框中将一个因素的不同级别相互划分？

下一篇：css - 使用css的calc + sass计算列宽