我有一个 refseq ID 列表 (keys_list),我用它来使用 BioPython Entrez 下拉序列记录。我只想访问返回的 fasta 记录中的序列,但我不想将记录写入文件才能执行此操作。
我正在尝试以下代码
for key in key_list:
Entrez.email = "myemailaddress"
handle = Entrez.efetch(db='nuccore', id=key, rettype='fasta')
record = SeqIO.parse(handle, "fasta")
for seq_record in SeqIO.parse(record, "fasta"):
print seq_record.seq
当我运行这个时,我收到错误:
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.py", line 538, in parse
yield r
File "/usr/lib64/python2.6/contextlib.py", line 34, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/lib64/python2.6/site-packages/Bio/File.py", line 59, in as_handle
yield handleish
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/__init__.py", line 537, in parse
for r in i:
File "/usr/lib64/python2.6/site-packages/Bio/SeqIO/FastaIO.py", line 37, in FastaIterator
line = handle.readline()
AttributeError: 'generator' object has no attribute 'readline'
如果我使用handle.read()
返回整个记录,我可以获得整个fasta记录,但在这个阶段我只想访问核苷酸序列。
谁能帮我解决这个问题吗?
提前非常感谢。
最佳答案
这就是您所需要的。
而不是:
handle = Entrez.efetch(db='nuccore', id=key, rettype='fasta')
试试这个:
handle = Entrez.efetch(db="nucleotide", id=key, retmode="xml") # retmode as 'xml' , db='nucleotide'
features = Entrez.read(handle)[0]
sequence = features['GBSeq_sequence'] # this is your sequence!
它返回一个字符串,该字符串是您的序列:
'ggctcgcatctctccttcacgcgcccgccgccttacctgaggccgccatccacgccggttgagtcgcgttctgccgcctcccgcctgtggtgcctcctgaactacgtccgccgtctaggtaagtttagagctcaggtcgagaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgaccctgcttgctcaactctacgtctttgtttcgttttctgttctgcgccgttacagatcgaaagttccacccctttccctttcattcacgactgactgccggcttggcccacggccaagtaccggcaactctgctggctcggagccagcgacagcccattctatagcactctccaggagagaaatttagtacacagttgggggctcgtccgggattcgagcgcccctttattccctaggcaatgggccaaatcttttcccgtagcgctagccctattccgcggccgccccgggggctggccgctcatcactggcttaacttcctccaggcggcatatcgcctagaacccggtccctccagttacgatttccaccagttaaaaaaatttcttaaaatagctttagaaacaccggtctggatctgccccattaactactccctcctagccagcctactcccaaaaggataccccggccgggtgaatgaaattttacacatactcatccaaacccaagcccagatcccgtcccgccccgcgccgccgccgccgtcatcctccacccacgaccccccggattctgacccacaaatcccccctccctatgttgagcctacagccccccaagtccttccagtcatgcacccacatggtgcccctcccaaccaccgcccatggcaaatgaaagacctacaggccattaagcaagaagtctcccaagcggcccctggaagcccccagtttatgcagaccatccggcttgcggtgcagcagtttgaccccactgccaaagacctccaagacctcctgcagtacctttgctcctccctcgtggcttccctccatcaccagcagctagatagccttatatcagaggccgaaactcgaggtattacaggttataaccccttagccggtcccctccgtgtccaagccaacaatccacaacaacaaggattaaggcgagaataccagcaactctggctcgccgccttcgccgccctgccagggagtgccaaagacccttcctgggcctctatcctccaaggcctggaggagccttaccacgccttcgtagaacgcctcaacatagctcttgacaatgggctgccagaaggcacgcccaaagaccccattttacgttccttagcctactctaatgcaaacaaagaatgccaaaaattactacaggcccgagggcacactaatagccctctaggagatatgttgcgggcttgtcaggcctggacccccaaagacaaaaccaaagtgttagttgtccagcctaaaaaaccccccccaaatcagccgtgcttccggtgcgggaaagcaggccactggagtcgggactgcactcagcctcgtcctccccctgggccatgccccctatgtcaagatccaactcactggaagcgagactgcccccgcctaaagcccactatcccagaaccagagccagaggaggatgccctcctattagatctccccgccgacatcccacacccaaaaaactccatagggggggaggtttaacctccccccccacattacagcaagtccttcctaaccaagacccaacatctattctgccagttataccgttagatcccgcccgtcggcccgtaattaaagcccagattgacacccagaccagccacccaaagactatcgaagctctactagatacaggagcagacatgacagtccttccgatagccttgttctcaagtaatactcccctcaaaaacacatccgtgttaggggcagggggccaaacccaagatcactttaagctcacctcccttcctgtgctaatacgcctccctttccggacgacgcctattgttttaacatcttgcctagttgataccaaaaacaactgggccatcataggtcgtgatgccttacaacaatgccaaggcgtcctgtacctccctgaggcaaaaaggccgcctgtaatcttgccaatacaggcgccagctgtccttgggctagaacacctcccaaggccccccgaaatcagccagttccctttaaaccagaacgcctccaggccttgcaacacttggtccggaaggccctggaggcaggccatatcgaaccctacaccgggccaggaaataacccagtattcccagttaaaaaagccaatggaacctggcgattcatccacgacctgcgggccactaactctctaaccatagatctctcatcatcttcccccgggccccctgacttgtccagcctgccaactacactagcccacttacaaactatagaccttaaagacgcctttttccaaatccccctacctaaacagttccagccctactttgctttcactgtcccacagcagtgtaactacggccccggcactagatacgcctggagagtactaccccaagggtttaaaaatagtcccaccctgttcgaaatgcagctggcccatatcctgcagcccattcggcaagccttcccccaatgcactattcttcagtacatggatgacattctcctggcaagcccctcccatgcggacctgcaactactctcagaggccacaatggcttccctaatctcccatgggttgcctgtgtccgaaaacaaaacccagcaaacccctggaacaattaagttcctagggcaaataatttcacctaatcacctcacttatgatgcagtccccaaggtacctatacggtcccgctgggcgctacctgaacttcaagccctacttggcgagattcagtgggtctccaaaggaactcctaccttacgccagccccttcacagtctctactgtgccttacaaaggcatactgatccccgagaccaaatatatttaaatccttctcaagttcaatcattagtgcagctgcggcaggccctgtcacagaactgccgcagtagactagtccaaaccctgcccctcctaggggctattatgctgaccctcactggcaccaccactgtggtgttccagtccaagcagcagtggccacttgtctggctacatgcccccctaccccacactagccagtgcccctgggggcagctacttgcctcagctgtgttattactcgacaaatacaccttgcaatcctatggactactctgccaaaccatacatcataacatctccacccaaaccttcaaccaattcattcaaacatctgaccaccccagtgttcctatcttactccaccacagtcaccgattcaaaaatttaggtgcccagactggagaactttggaacacttttcttaaaacaactgccccattggctcctgtgaaagcccttatgccagtgtttactctttcccctgtgatcataaacaccgccccttgcctgttttcagacggatccacctcccaggcagcctatattctctgggacaagcatatattgtcacaaagatcattcccccttccgccaccgcacaagtcggcccaacgggccgaacttctcggacttttgcatggcctctccagcgcccgttcgtggcgctgtctcaacatatttctagactccaagtatctttatcattaccttcggacccttgccctaggcaccttccaaggcaggtcctctcaggccccctttcaggccctcctgccccgcttactatcgcgtaaggtcgtctatttgcaccacgttcgcagccataccaatctacctgatcccatctccaggctcaacgctctcacagatgccctactaatcacccctgtcctgcagctctctcctgcagacctacacagtttcacccattgcggacagacggccctcacactgcaaggggcaaccacaactgaggcctccaatatcctgcgctcttgccacgcctgccgcaaaaataacccacaacatcagatgcctcaaggacacatccgccgtggcctactccctaaccacatctggcaaggcgacattacccatttcaaatataaaaatacactgtatcgccttcatgtatgggtagacaccttttcaggagccatctcagctacccaaaagagaaaagaaacaagctcagaagctatttcctctttgctccaggccattgcctatctaggcaagcctagctacataaacacagacaatggccctgcctatatttcccaagacttcctcaatatgtgtacctcccttgctattcgccatactacccatgtcccctacaatccaaccagctccggacttgtagaacgctctaatggcattcttaaaaccctattatataagtactttactgacaaacccgacctacctatggataatgctctatccatagccctatggacaatcaaccacctaaatgtattaaccaactgccacaaaacccgatggcagcttcaccactccccccgactccagccgatcccagagacacattccctcagcaataaacaaacccattggtattatttcaagcttcctggtcttaatagccgccagtggaaaggaccacaggaggctcttcaagaagctgccggcgctgctctcatcccggtaagcgctagttctgcccagtggatcccgtggaggctcctcaagcgagctgcatgcccaagacccgtcggaggccccgccgatcccaaagaaaaagaccaccaacaccatgggtaagtttctcgccactttgattttattcttccagttctgccccctcatcctcggtgattacagccccagctgctgtactctcacagttggagtctcctcataccactctaaaccctgcaatcctgcccagccagtttgttcatggaccctcgacctgctggccctttcagcagatcaggccctacagccaccctgccctaatctagtaagttactccagctaccatgccacctattccctatatctattccctcattggatcaaaaagccaaaccgaaatggcggaggctattattcagcctcttattcagacccttgttccttaaaatgcccatacctagggtgccaatcatggacctgcccctatacaggagccgtctccagcccctactggaaatttcagcaagatgtcaattttactcaagaagtttcacacctcaatattaatctccatttttcaaaatgcggtttttccttctcccttctagtcgacgctccaggatatgaccccatctggttccttaataccgaacccagccaactgcctcccaccgcccctcctctactctcccactctaacctagaccatatcctcgagccctctataccatggaaatcaaaactcctgactcttgtccagttaaccctacaaagcactaattatacttgcattgtctgtatcgatcgtgccagcctatccacttggcacgtcctatactctcccaacgtctctgttccatccccttcttctacccccctcctttacccatcgttagcgcttccagccccccacctgacgttaccatttaactggacccactgctttgacccccagattcaagctatagtctcctccccctgtcataactccctcatcctgccccccttttccttgtcacctgttcccacgctaggatcccgctcccgccgagcagtaccggtggcggtctggcttgtctccgccctggccatgggagccggagtggctggcaggattaccggctccatgtccctcgcctcaggaaagagcctcctacatgaggtggacaaagatatttcccaattaactcaagcaatagtcaaaaaccacaaaaatctgctcaaaattgcacagtatgctgcccagaacagacgaggccttgatctcctgttctgggagcaaggaggattatgcaaagcattacaagaacagtgctgttttctaaatattactaattcccatgtctcaatactacaagagagacccccccttgaaaatcgagtcctgactggctggggccttaactgggaccttggcctctcacagtgggctcgagaagccttacaaactggaatcacccttgtcgcgctactccttcttgttatccttgcaggaccatgcatcctccgtcagctacgacacctcccctcgcgcgtcagatacccccattactctcttataaaccctgagtcatccctgtaaaccaagcacacaattattgcaaccacatcgcctccagcctcccctgccaataattaacctctcccatcaaatcctccttctcctgcagcaacctcctccgttcagcctccaaggactccacctcgccttccaactgtctagtatagccatcaacccccaactcctgcattttttctttcctagcactatgctgtttcgccttctcagccccttgtctccacttgcgctcacggcgctcctgctcttcctgctttctccgggcgaagtcagcggccttctcctccgcccgcttcctgcgccgtgccttctcctcttccttccttttcaaatactcagcaatctgcttttcctcctctttctcccgctctttttttcgcttcctcttctcctcagcccgtcgctgccgatcacgatgcgtttccccgcgaggtggcgctttcccccctggagggccccgtcgcagccggccgcggctttcctcttctagagatagcaaaccgtcaagcacagtttcctcctcctccttgtcctttaactcttcctccaaggataatagcccgtccaccaattcctccaccagcaggtcctccgggcatggaacaggcaaacatcgaaacagccctacggatacaaagttaaccatgcttattatcagcccacttcccagggtttggacagagtcttcttttcggatacccagtctacgtgtttggagactgtgtacaaggcgactggtgccccatctctgggggactatgttcggcccgcctacatcgtcacgccctactggccacctgtccagagcatcagatcacctgggaccccatcgatggacgcgttatcggctcagctctacagttccttatccctcgactcccctccttccccacccagagaacctctaagacccttaaggtccttaccccgccaatcactcatacaacccccaacattccaccctccttcctccaggccatgcgcaaatactcccccttccgaaatggatacatggaacccacccttgggcagcacctcccaaccctgtcttttccagaccccggactccggccccaaaacctgtacaccctctggggaggctccgttgtctgcatgtacctctaccagctttccccccccatcacctggcccctcctgccccatgtgattttttgccaccccggccagctcggggccttcctcaccaatgttccctacaaacgaatagaaaaactcctctataaaatttcccttaccacaggggccctaataattctacccgaggactgtttgcccaccacccttttccagcctgctagggcacccgtcacgctgacagcctggcaaaacggcctccttccgttccactcaaccctcaccactccaggccttatttggacatttaccgatggcacgcctatgatttccgggccctgccctaaagatggccagccatctttagtactacagtcctcctcctttatatttcacaaatttcaaaccaaggcctaccacccctcatttctactctcacacggcctcatacagtactcttcctttcataatttgcatctcctatttgaagaatacaccaacatccccatttctctactttttaacgaaaaagaggcagatgacaatgaccatgagccccaaatatcccccgggggcttagagcctctcagtgaaaaacatttccgtgaaacagaagtctgagaaggtcagggcccagaataaggctctgacgtctccccccggaggacagctcagcaccagctcaggctaggccctgacgtgtccccctaaagacaaatcataagctcagacctccgggaagccaccgggaaccacccatttcctccccatgtttgtcaagccgtcctcaggcgttgacgacaacccctcacctcaaaaaacttttcatggcacgcatacggctcaataaaataacaggagtctataaaagcgtggggacagttcaggagggggctcgcatctctccttcacgcgcccgccgccttacctgaggccgccatccacgccggttgagtcgcgttctgccgcctcccgcctgtggtgcctcctgaactacgtccgccgtctaggtaagtttagagctcaggtcgagaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgaccctgcttgctcaactcta'
关于python - 使用 Biopython Entrez 从 fasta 记录访问序列元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17771043/