我有一个可以使用的fasta列表,但其中一些也可能不会使用,所以如果需要的话我希望使用snakemake来索引fastq
我建立了一个像这样的yaml文件
# config.yaml
reference_genome:
fa1: "path/to/genome"
fa2: "..."
fa3: "..."
...
我写了一个像这样的snakemake
configfile: "config.yaml"
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{reference_genome}.{type}', reference_genome={reference_genome}, type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
我希望snakemake可以监控索引文件(amb, ann, pac)
,但是这个脚本会引发以下错误:
name 'reference_genome' is not defined
File "/public/...", line ..., in <module>
更新:基于@dariober的回答: 如果我们使用以下 config.yaml 运行
reference_genome:
fa1: "genome_1.fa"
fa2: "genome_2.fa"
fa3: "genome_3.fa"
我期望输出是
genome_1.fa.{amb, ann, pac}
genome_2.fa.{amb, ann, pac}
genome_3.fa.{amb, ann, pac}
如果我们使用以下解决方法
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
我们会得到
$ snakemake -s snakemake_test.smk --configfile config.yaml
# for reference_name is fa1
[Fri Dec 2 17:56:29 2022]
rule index:
input: genome_1.fa
output: fa1.amb, fa1.ann, fa1.pac
log: log/rule_index_fa1.log
jobid: 1
wildcards: reference_genome=fa1
...
这不是我预期的输出
输出是 fa1.amb、fa1.ann、fa1.pac,但我想要输出是基因组_1.fa.amb、基因组_1.fa.ann、基因组_1.fa.pac
最佳答案
我的猜测是你想要:
rule all:
input:
expand('{reference_genome}.{type}', reference_genome=['fa1', 'fa2', 'fa3'], type=['amb', 'ann', 'pac'])
rule index:
input:
#reference_genomeFile
ref_genome=lambda wildcards:config['reference_genome'][wildcards.reference_genome]
output:
expand('{{reference_genome}}.{type}', type=['amb', 'ann', 'pac'])
log:
'log/rule_index_{reference_genome}.log'
shell:
"bwa index -a bwtsw {input.ref_genome} > {log} 2>&1"
snakemake -p -n -j 1 --configfile config.yaml
即:对每个基因组文件运行规则index
,即这里运行3次。每次运行 index
都会生成三个索引文件。请注意使用双花括号 {{reference_genome}}
来告诉 expand
该通配符不需要扩展。
示例 config.yaml:
reference_genome:
fa1: "genome_1.fa"
fa2: "genome_2.fa"
fa3: "genome_3.fa"
关于python - 在snakemake规则输出中使用配置和通配符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74652332/