python - 执行此搜索算法的更有效方法？

我只是想知道是否有更好的方法来执行此算法。我发现我需要经常执行此类操作，而我目前执行此操作的方式需要数小时，因为我认为它会被视为 n^2 算法。我会附在下面。

import csv

with open("location1", 'r') as main:
    csvMain = csv.reader(main)
    mainList = list(csvMain)

with open("location2", 'r') as anno:
    csvAnno = csv.reader(anno)
    annoList = list(csvAnno)

tempList = []
output = []

for full in mainList:
    geneName = full[2].lower()
    for annot in annoList:
        if geneName == annot[2].lower():
            tempList.extend(full)
            tempList.append(annot[3])
            tempList.append(annot[4])
            tempList.append(annot[5])
            tempList.append(annot[6])
            output.append(tempList)

        for i in tempList:
            del i

with open("location3", 'w') as final:
    a = csv.writer(final, delimiter=',')
    a.writerows(output)

我有两个 csv 文件，每个文件包含 15,000 个字符串，我希望比较每个文件的列，如果它们匹配，则将第二个 csv 的末尾连接到第一个 csv 的末尾。任何帮助将不胜感激!

谢谢!

最佳答案

这样应该效率更高:

import csv
from collections import defaultdict

with open("location1", 'r') as main:
  csvMain = csv.reader(main)
  mainList = list(csvMain)

with open("location2", 'r') as anno:
  csvAnno = csv.reader(anno)
  annoList = list(csvAnno)

output = []
annoMap = defaultdict(list)

for annot in annoList:
  tempList = annot[3:]  # adapt this to the needed columns
  annoMap[annot[2].lower()].append(tempList)  # put these columns into the map at position of the column of intereset

for full in mainList:
  geneName = full[2].lower()
  if geneName in annoMap:  # check if matching column exists
    output.extend(annoMap[geneName])

with open("location3", 'w') as final:
  a = csv.writer(final, delimiter=',')
  a.writerows(output)

它的效率更高，因为您只需要遍历每个列表一次。字典中的查找平均为 O(1)，因此您基本上得到了一个线性算法。

关于python - 执行此搜索算法的更有效方法？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/42256304/

python - 执行此搜索算法的更有效方法？

上一篇：arrays - 为什么自顶向下归并排序中数组访问是 6NlogN？

下一篇：javascript - 负循环 while 循环是无限循环 - 尝试在没有操作数的情况下添加