我正在尝试从文本文件收集数据。当我打印输出时,它们返回我正在寻找的正确值,但是,当我尝试使用 xlsxwriter 将这些输出放入表中时,该表仅包含 txt 文件最后一行的输出重复的数量文本文件中存在行的次数。 即有 5000 行文本,我需要从中获取 3 条信息,.xlsx 文件有 5000 行和 3 列,但都包含文本文件中最后一行的信息。
EC:1 > GO:N-乙基马来酰亚胺还原 enzyme active ; GO:0008748
EC:1 > GO:氧化还原 enzyme active ; GO:0016491
EC:1 > GO:还原型(prototype)辅 enzyme F420 脱氢 enzyme active ; GO:0043738
EC:1 > GO:硫加氧 enzyme 还原 enzyme active ; GO:0043826
EC:1 > GO:苹果酸乳酸 enzyme active ; GO:0043883
^txt 文件是什么样的
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
6.6.1.2钴螯合 enzyme active 0051116
…………
(表格的外观,但只有 5000 行)
如有任何帮助,我们将不胜感激, 问候
import xlsxwriter
File = 'EC_to_GO.txt'
def analysis(line, output):
with open(File) as fp:
lines = fp.readlines()
for line in lines:
output[0] = line[3:].split(' > ')[0]
output[1] = line[:-14].split(' > GO:')[-1]
output[2] = line[-8:]
return output
with open(File) as fp:
lines = fp.readlines()
for line in lines:
if 'Generated on 2018-07-04T09:08Z' in line:
a = lines.index(line)
for line in lines:
if 'GO:cobaltochelatase activity ; GO:0051116' in line:
b = lines.index(line)
req_list = lines[a:b]
rxn_end_index = []
for i in range(len(req_list)):
if '> GO:' in req_list[i]:
rxn_end_index.append(i)
inner_list = []
outer_list =[]
spare = [0] + rxn_end_index
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(inner_list)
res_list=[]
for i in range(len(outer_list)):
res_list.append(analysis(outer_list[i],['NA','NA','NA']))
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('EC_to_GO.xlsx')
worksheet = workbook.add_worksheet('EC_to_GO')
#res_list1 = [EC, Genome name, GO]
#for i in res_list:
#res_list1.append(i)
# Some data we want to write to the worksheet.
t = tuple(res_list)
# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0
# Iterate over the data and write it out row by row.
for a,b,c in (t):
worksheet.write(row, col, a)
worksheet.write(row, col + 1, b)
worksheet.write(row, col + 2, c)
row += 1
workbook.close()
最佳答案
您基本上将相同的列表附加到 res_list
中。因此,您有同一个输出
列表的多个副本。
修复: 而不是
res_list.append(analysis(outer_list[i],['NA','NA','NA']))
#And in the previous loop
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(inner_list)
将其更改为:
res_list.append(analysis(outer_list[i],['NA','NA','NA'])[:])
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(inner_list[:])
或者
res_list.append(copy(analysis(outer_list[i],['NA','NA','NA'])))
for i in range(len(spare)-1):
inner_list = req_list[spare[i]:spare[i+1]]
outer_list.append(copy(inner_list))
符号 list[:] 创建列表的副本。从技术上讲,您正在创建整个列表的一部分。
关于python - 返回的输出未进入 xlsxwriter,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51308602/