python - 如何使用 Bio.Geo 读取某些 GEO 文件?

标签 python parsing bioinformatics biopython

我正在尝试按照以下方式根据教程使用 Bio.Geo 解析 GEO 文件:

from Bio import Geo
handle = open('GSE40603_combined_L1_L2.txt')
records = Geo.parse(handle)
for record in records:
    print record

但是我得到以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 585, in runfile
    execfile(filename, namespace)
  File "/home/ilya/Документы/biology/E coli GCC/GEOanalyzer.py", line 11, in <module>
    for record in records:
  File "/usr/local/lib/python2.7/dist-packages/Bio/Geo/__init__.py", line 60, in parse
    record.table_rows.append(row)
AttributeError: 'NoneType' object has no attribute 'table_rows'

这是该文件的头部:

0   0   63  NC_000913   0   152 NC_000913   0   152 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL  
0   1   81  NC_000913   0   152 NC_000913   153 599 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= thrL  |CDS(+,190,255) gene= thrL  |gene gene= thrA  |CDS(+,337,2799) gene= thrA  note= bifunctional: aspartokinase I (N-terminal); 
0   2   1   NC_000913   0   152 NC_000913   600 698 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= thrA  |CDS[fcd=-312](+,337,2799) gene= thrA  note= bifunctional: aspartokinase I (N-terminal); 
0   3   1   NC_000913   0   152 NC_000913   699 755 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= thrA  |CDS[fcd=-390](+,337,2799) gene= thrA  note= bifunctional: aspartokinase I (N-terminal); 
0   4   1   NC_000913   0   152 NC_000913   756 757 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= thrA  |CDS[fcd=-419](+,337,2799) gene= thrA  note= bifunctional: aspartokinase I (N-terminal); 
0   2620    1   NC_000913   0   152 NC_000913   352429  352483  |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= prpE  |CDS[fcd=-526](+,351930,353816) gene= prpE  note= putative propionyl-CoA synthetase  
0   18818   1   NC_000913   0   152 NC_000913   2560323 2560384 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |misc_feature note= cryptic prophage Eut/CPZ-55  |gene gene= yffO  |CDS[fcd=-220](+,2560133,2560549) gene= yffO  
0   2617    1   NC_000913   0   152 NC_000913   352326  352375  |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= prpE  |CDS[fcd=-420](+,351930,353816) gene= prpE  note= putative propionyl-CoA synthetase  
0   18817   1   NC_000913   0   152 NC_000913   2560275 2560322 |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |misc_feature note= cryptic prophage Eut/CPZ-55  |gene gene= yffO  |CDS[fcd=-165](+,2560133,2560549) gene= yffO  
0   912 1   NC_000913   0   152 NC_000913   113055  113082  |neigh_up NC_000913-start |neigh_down CDS[fcd=114](+,190,255) gene= thrL    |gene gene= coaE  |CDS[fcd=151](-,112599,113219) gene= coaE  note= putative DNA repair protein 

我做错了什么吗?我如何读取此类文件?

最佳答案

此文件是来自 GEO 的所谓“补充文件”。它是由原始提交者提供的,因此读取 GEO 格式的工具不适用于它。

在这种特殊情况下,最好的办法是使用标准 python 工具简单地解析下载的文件。

关于python - 如何使用 Bio.Geo 读取某些 GEO 文件?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19961582/

相关文章:

一次合并3行的linux脚本

javascript - 如何从 JavaScript 表达式中提取键路径

JavaScript:解析 bool

r - 有没有办法将更大列表中的数据帧列表汇总在一起?

python - 使用 WingIDE 时 Google App Engine 开发服务器启动缓慢

python - 在 2 个 python 列表的开头查找公共(public)元素的最快方法?

python - 向 PostgreSQL 插入元组的问题

python - 使用 python 编辑 html,但 lxml 将漂亮的 html 实体转换为奇怪的编码

r - R 中的 which 函数没有给出所需的输出

python - BioPython:使用 Entrez.esummary/Entrez.read 跳过错误的 GID