过去一年我一直致力于在 SLURM 管理的集群上执行的项目。现在我们想让我们的结果可重复,为此,我们将其移植到snakemake。然而,我是从头开始学习的,这让我很头疼。
下面是我的代码:
# module load python/3.7
# python -m venv ./venv
# source ./venv/bin/activate
# pip install snakemake
# snakemake --version
configfile: "config.yaml"
#localrules:
vcf=config["vcf"],
ncbiFiles=config["ncbiFiles"]
LC=config["leafcutter"]
rule all:
input: ".prepare_phen_table.chkpnt"
rule filter_vcf:
input:
expand("{vcf}", vcf=config["vcf"]),
expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
shell:
expand("sbatch --wait --export=vcf={vcf},outdir=$PWD src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh", vcf=config["vcf"])
rule index_vcf:
input:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])
shell:
expand("sbatch --export=outdir=$PWD src/sqtl_mapping/primary/sh/00b_index_vcf.sh;"
"touch {ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"])
rule junc_cluster:
input:
expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
output:
".junc_cluster.chkpnt"
shell:
"sbatch --wait src/sqtl_mapping/sh/01_junc_cluster.sh;"
"touch .junc_cluster.chkpnt"
rule intron_clustering:
input:
".junc_cluster.chkpnt"
output:
".intron_clustering.chkpnt"
shell:
expand("sbatch --wait src/sqtl_mapping/sh/02_intronclustering.sh {LC};"
"touch .intron_clustering.chkpnt;"
"cd intronclustering/", LC=config["leafcutter"])
rule prepare_phen_table:
input:
LC,
".intron_clustering.chkpnt"
output:
".prepare_phen_table.chkpnt"
shell:
expand("sbatch --wait src/sqtl_mapping/sh/03_prepare_phen_table.sh {LC};"
"touch .prepare_phen_table.chkpnt",LC=config["leafcutter"])
请假设config.yaml
没问题。当我调用 snakemake -n
时,出现以下错误:
(venv) [<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8decfee8f4e8e9e4bfcde7e5f8a3e8e9f8" rel="noreferrer noopener nofollow">[email protected]</a>@rmccoy22-dev neand_sQTL]$ snakemake -n
Building DAG of jobs...
Job counts:
count jobs
1 all
1 index_vcf
1 intron_clustering
1 junc_cluster
1 prepare_phen_table
5
[Fri Aug 16 12:31:53 2019]
rule index_vcf:
input: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz
output: /scratch/groups/rmccoy22/Ne_sQTL/files/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi, /scratch/groups/rmccoy22/Ne_sQTL/files/.index_vcf.chkpnt
jobid: 4
Traceback (most recent call last):
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/__init__.py", line 547, in snakemake
export_cwl=export_cwl)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/workflow.py", line 674, in execute
success = scheduler.schedule()
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 278, in schedule
self.run(job)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/scheduler.py", line 294, in run
error_callback=self._error)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 75, in run
self._run(job)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 86, in _run
self.printjob(job)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/executors.py", line 92, in printjob
job.log_info(skip_dynamic=True)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 825, in log_info
logger.shellcmd(self.shellcmd, indent=indent)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 323, in shellcmd
self.rule.shellcmd else None)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/jobs.py", line 732, in format_wildcards
return format(string, **_variables)
File "/scratch/groups/rmccoy22/aseyedi2/neand_sQTL/venv/lib/python3.7/site-packages/snakemake/utils.py", line 378, in format
return fmt.format(_pattern, *args, **variables)
File "/software/apps/python/3.7/lib/python3.7/string.py", line 186, in format
return self.vformat(format_string, args, kwargs)
File "/software/apps/python/3.7/lib/python3.7/string.py", line 190, in vformat
result, _ = self._vformat(format_string, args, kwargs, used_args, 2)
File "/software/apps/python/3.7/lib/python3.7/string.py", line 200, in _vformat
self.parse(format_string):
File "/software/apps/python/3.7/lib/python3.7/string.py", line 284, in parse
return _string.formatter_parser(format_string)
TypeError: expected str, got list
除了可能存在某种错误或与我正在尝试做的事情不兼容之外,我不知道该怎么做。
感谢您为我提供的任何帮助。
最佳答案
好吧,我决定冒昧地提供一些风格的反馈:)。这就是我要做的(我当然无法测试它):
configfile: "config.yaml"
rule all:
input:
".prepare_phen_table.chkpnt"
rule filter_vcf:
input:
expand("{vcf}", vcf=config["vcf"]),
expand("{ncbiFiles}", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
shell:
"src/sqtl_mapping/primary/sh/00a_bcftools_filter.sh"
rule index_vcf:
input:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.HQ.vcf.gz", ncbiFiles=config["ncbiFiles"])
output:
expand("{ncbiFiles}/phg000830.v1.GTEx_WGS.genotype-calls-vcf.c1/GTExWGSGenotypeMatrixBiallelicOnly.vcf.gz.tbi",ncbiFiles=config["ncbiFiles"]),
touch(expand("{ncbiFiles}/.index_vcf.chkpnt",ncbiFiles=config["ncbiFiles"]))
shell:
"src/sqtl_mapping/primary/sh/00b_index_vcf.sh"
rule junc_cluster:
input:
expand("{ncbiFiles}/.index_vcf.chkpnt", ncbiFiles=config["ncbiFiles"])
output:
touch(".junc_cluster.chkpnt")
shell:
"src/sqtl_mapping/sh/01_junc_cluster.sh"
rule intron_clustering:
input:
".junc_cluster.chkpnt"
output:
touch(".intron_clustering.chkpnt")
params:
LC=config["leafcutter"]
shell:
"src/sqtl_mapping/sh/02_intronclustering.sh {params.LC}"
rule prepare_phen_table:
input:
config["leafcutter"],
".intron_clustering.chkpnt"
output:
touch(".prepare_phen_table.chkpnt")
params:
LC=config["leafcutter"]
shell:
"src/sqtl_mapping/sh/03_prepare_phen_table.sh {params.LC}"
最重要的是,这应该可以解决您的问题:snakemake 提示它需要一个字符串,但得到一个列表。 Expand 函数返回所有组合的列表,在您的情况下是一个长列表(但仍然是列表而不是字符串)。我只是将 LC 存储在参数中,而不是尝试通过调用 Expand 来填充其值。
其次,我将你的检查点触摸更改为很酷的 in-built function的蛇形。
最后但并非最不重要的一点是,我删除了你所有的 sbatch 调用。 Snakemake的想法是它可以在任何平台上执行。本地计算机和 super 集群等。如果您对 sbatch 命令进行硬编码,它将仅适用于 slurm 集群。如果您想在本地计算机上执行代码,只需键入
snakemake ...[params]
如果你想在 slurm 集群上运行它,你需要改变的是
snakemake ...[params] --cluster "sbatch"
snakemake 会将 sbatch 命令添加到所有函数调用之前。简单的!再看看cluster docs .
请务必再次查看snakemake文档和教程,以便了解snakemake的思想/范式。也许我的更改可以解决您 90% 的问题,但最终需要您进行一些小修改。祝你好运!
关于python-3.x - Snakemake 在干运行时产生严重不连贯的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57528586/