我只是想知道是否有更好的方法来执行此算法。我发现我需要经常执行此类操作,而我目前执行此操作的方式需要数小时,因为我认为它会被视为 n^2 算法。我会附在下面。
import csv
with open("location1", 'r') as main:
csvMain = csv.reader(main)
mainList = list(csvMain)
with open("location2", 'r') as anno:
csvAnno = csv.reader(anno)
annoList = list(csvAnno)
tempList = []
output = []
for full in mainList:
geneName = full[2].lower()
for annot in annoList:
if geneName == annot[2].lower():
tempList.extend(full)
tempList.append(annot[3])
tempList.append(annot[4])
tempList.append(annot[5])
tempList.append(annot[6])
output.append(tempList)
for i in tempList:
del i
with open("location3", 'w') as final:
a = csv.writer(final, delimiter=',')
a.writerows(output)
我有两个 csv 文件,每个文件包含 15,000 个字符串,我希望比较每个文件的列,如果它们匹配,则将第二个 csv 的末尾连接到第一个 csv 的末尾。任何帮助将不胜感激!
谢谢!
最佳答案
这样应该效率更高:
import csv
from collections import defaultdict
with open("location1", 'r') as main:
csvMain = csv.reader(main)
mainList = list(csvMain)
with open("location2", 'r') as anno:
csvAnno = csv.reader(anno)
annoList = list(csvAnno)
output = []
annoMap = defaultdict(list)
for annot in annoList:
tempList = annot[3:] # adapt this to the needed columns
annoMap[annot[2].lower()].append(tempList) # put these columns into the map at position of the column of intereset
for full in mainList:
geneName = full[2].lower()
if geneName in annoMap: # check if matching column exists
output.extend(annoMap[geneName])
with open("location3", 'w') as final:
a = csv.writer(final, delimiter=',')
a.writerows(output)
它的效率更高,因为您只需要遍历每个列表一次。字典中的查找平均为 O(1),因此您基本上得到了一个线性算法。
关于python - 执行此搜索算法的更有效方法?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42256304/