我正在尝试随机抽取一个大型 FASTQ 文件并将其写入标准输出。我不断收到“GC overhead limit exceeded”错误,我不确定自己做错了什么。我试过在 leiningen 中增加 Xmx,但这没有帮助。这是我的代码:
(ns fastq-sample.core
(:gen-class)
(:use clojure.java.io))
(def n-read-pair-lines 8)
(defn sample? [sample-rate]
(> sample-rate (rand)))
;
; Agent for writing the reads asynchronously
;
(def wtr (agent (writer *out*)))
(defn write-out [r]
(letfn [(write [out msg] (.write out msg) out)]
(send wtr write r)))
(defn write-close []
(send wtr #(.close %))
(await wtr))
;
; Main
;
(defn reads [file]
(->>
(input-stream file)
(java.util.zip.GZIPInputStream.)
(reader)
(line-seq)))
(defn -main [fastq-file sample-rate-str]
(let [sample-rate (Float. sample-rate-str)
in-reads (partition n-read-pair-lines (reads fastq-file))]
(doseq [x (filter (fn [_] (sample? sample-rate)) in-reads)]
(write-out (clojure.string/join "\n" x)))
(write-close)
(shutdown-agents)))
最佳答案
当我尝试将一个无限序列合并到一个简单的数据结构(如 map 或矢量)时,我经常遇到同样的症状。这通常意味着内存紧张,垃圾收集器跟不上对新对象的需求。 很可能是 wtr 代理对内存来说太大。也许您可能不想通过更改将打印结果存储在原子中
(write [out msg] (.write out msg) out)
到
(write [out msg] (.write out msg))
关于Clojure 错误 - 超出 GC 开销限制,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19120714/