Nextflow:输出不是 "found",尽管设置了 publishDir

标签 nextflow

我有以下 nextflow 脚本:

echo true                                                                       
                                                                                
wd = "$params.wd"                                                               
geoid = "$params.geoid"                                                         
                                                                                
                                                                                
process step1 {                                                                 
                                                                                
 publishDir = "$wd/data/"                                                       
                                                                                
 input:                                                                         
  val celFiles from "$wd/data/$geoid"                                           
                                                                                
 output:                                                                        
  file "${geoid}_datFiles.RData" into channel                                   
                                                                                
 """                                                                            
 Rscript $wd/scripts/step1.R $celFiles $wd/data/${geoid}_datFiles.RData         
                                                                                
 """                                                                            
                                                                                
}                                                                               
  
Rscript 包含以下命令:
step1=function(WD,
               celFiles,
               output) {
 
  library(affy)

  datFiles=ReadAffy(celfile.path=paste0(WD,"/",celFiles))
  
  save(datFiles,file=output)

}

args=commandArgs(trailingOnly=TRUE)
WD=args[1]
celFiles=args[2]
output=args[3]

step1(WD,celFiles,output)

当它运行时,输出文件保存在我想要的目录中($wd/data/${geoid}_datFiles.RData)。鉴于publishDir 指向同一目录,我希望输出(定义为“${geoid}_datFiles.RData”)在publishDir 目录下可用。
但是,我收到以下错误:
  Missing output file(s) `GSE4290_datFiles.RData` expected by process `step1`
日志文件表明 nextflow 仍在工作流创建的目录中寻找输出:
Process `step1` is unable to find [UnixPath]: `/Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/work/92/42afb131a36eb32ed780bd1bf3bc3b/GSE4290_datFiles.RData`
完整的日志文件:
Nov-12 17:55:39.611 [main] DEBUG nextflow.cli.Launcher - $> nextflow run main.nf
Nov-12 17:55:39.945 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 20.07.1
Nov-12 17:55:39.968 [main] INFO  nextflow.cli.CmdRun - Launching `main.nf` [infallible_brahmagupta] - revision: d68e496ea0
Nov-12 17:55:40.026 [main] DEBUG nextflow.config.ConfigBuilder - Found config local: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/nextflow.config
Nov-12 17:55:40.029 [main] DEBUG nextflow.config.ConfigBuilder - Parsing config file: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/nextflow.config
Nov-12 17:55:40.140 [main] DEBUG nextflow.config.ConfigBuilder - Applying config profile: `standard`
Nov-12 17:55:41.288 [main] DEBUG nextflow.Session - Session uuid: 94f22a74-2a63-4a87-9fb3-33cf925a5a74
Nov-12 17:55:41.288 [main] DEBUG nextflow.Session - Run name: infallible_brahmagupta
Nov-12 17:55:41.289 [main] DEBUG nextflow.Session - Executor pool size: 4
Nov-12 17:55:41.326 [main] DEBUG nextflow.cli.CmdRun -
  Version: 20.07.1 build 5412
  Created: 24-07-2020 15:18 UTC (08:18 PDT)
  System: Mac OS X 10.15.7
  Runtime: Groovy 2.5.11 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_111-b14
  Encoding: UTF-8 (UTF-8)
  Process: 46458@Rebeccas-MacBook-Pro-6.local.ucsf.edu [10.49.41.197]
  CPUs: 4 - Mem: 8 GB (708.4 MB) - Swap: 2 GB (927 MB)
Nov-12 17:55:41.353 [main] DEBUG nextflow.Session - Work-dir: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/work [Mac OS X]
Nov-12 17:55:41.354 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /Users/rebeccaeliscu/Desktop/workflow/affymetrix/nextflow/bin
Nov-12 17:55:41.594 [main] DEBUG nextflow.Session - Observer factory: TowerFactory
Nov-12 17:55:41.598 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
Nov-12 17:55:41.911 [main] DEBUG nextflow.Session - Session start invoked
Nov-12 17:55:42.309 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
Nov-12 17:55:42.331 [main] DEBUG nextflow.Session - Workflow process names [dsl1]: step1
Nov-12 17:55:42.334 [main] WARN  nextflow.script.BaseScript - The use of `echo` method has been deprecated
Nov-12 17:55:42.495 [main] DEBUG nextflow.executor.ExecutorFactory - << taskConfig executor: null
Nov-12 17:55:42.496 [main] DEBUG nextflow.executor.ExecutorFactory - >> processorType: 'local'
Nov-12 17:55:42.508 [main] DEBUG nextflow.executor.Executor - [warm up] executor > local
Nov-12 17:55:42.521 [main] DEBUG n.processor.LocalPollingMonitor - Creating local task monitor for executor 'local' > cpus=4; memory=8 GB; capacity=4; pollInterval=100ms; dumpInterval=5m

最佳答案

您的输出声明正在当前工作目录中查找文件:"${geoid}_datFiles.RData" ,但您的 Rscript 正在写入:$wd/data/${geoid}_datFiles.RData .如果您将命令更改为:

Rscript $wd/scripts/step1.R $celFiles ${geoid}_datFiles.RData
然后 Nextflow 应该能够找到输出文件。然后publishDir 指令会将其“发布”到定义的publishDir。

关于Nextflow:输出不是 "found",尽管设置了 publishDir,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64814400/

相关文章:

nextflow - 在 nextflow 中将多个输出 block 合并到一个文件

groovy - 用于处理给定目录中所有文件的 Nextflow 脚本

python - Nextflow 在进程间操作变量

pipeline - 如何将流程输出发送到 Nextflow 中的多个 channel ?

nextflow - NextFlow 是否适用于面向文件的案例?

groovy - Nextflow:将 fromFilePairs 的输入转换为 (map, list_pair_1, list_pair_2) 的元组

directed-acyclic-graphs - 你能让 Nextflow DAG 可视化变得漂亮吗?

groovy - 将以前的 bash 调度程序提交脚本合并到 NextFlow 工作流中的最类似于 NextFlow (DSL2) 的方式

bash - 将流程脚本中生成的文件内容保存到 Nextflow 变量中

nextflow - 更改 .nextflow 文件夹的位置?