作为初始情况,我有一个 sha1 哈希值。我想将此与充满哈希值的文件进行比较,以查看 sha1 哈希值是否包含在具有哈希值的文件中。
更确切地说:
f1=sha1 #value read in
fobj = open("Hashvalues.txt", "r") #open file with hash values
for f1 in fobj:
print ("hash value found")
else:
print("HashValue not found")
fobj.close()
文件很大 (11.1GB)
是否有一个有用的算法来尽可能快地执行搜索?哈希文件中的哈希值按哈希排序。 我认为逐行比较这不是最快的方法,对吗?
编辑: 我更改了我的代码如下:
f1="9bc34549d565d9505b287de0cd20ac77be1d3f2c" #value read in
with open("pwned-passwords-sha1-ordered-by-hash-v5.txt") as f:
lineList = [line.rstrip('\n\r') for line in open("pwned-passwords-sha1-
ordered-by-hash-v5.txt")]
def binarySearch(arr, l, r, x):
while l <= r:
mid = l + (r - l)/2;
# Check if x is present at mid
if arr[mid] == x:
return mid
# If x is greater, ignore left half
elif arr[mid] < x:
l = mid + 1
# If x is smaller, ignore right half
else:
r = mid - 1
# If we reach here, then the element
# was not present
return -1
# Test array
arr = lineList
x = "9bc34549d565d9505b287de0cd20ac77be1d3f2c" #value read in
# Function call
result = binarySearch(arr, 0, len(arr)-1, x)
if result != -1:
print "Element is present at index % d" % result
else:
print "Element is not present in array"
但它并没有我想象的那么快。我的实现是否正确?
编辑2:
def binarySearch (l, r, x):
# Check base case
if r >= l:
mid = l + (r - l)/2
# If element is present at the middle itself
if getLineFromFile(mid) == x:
return mid
# If element is smaller than mid, then it
# can only be present in left subarray
elif getLineFromFile(mid) > x:
return binarySearch(l, mid-1, x)
# Else the element can only be present
# in right subarray
else:
return binarySearch(mid + 1, r, x)
else:
# Element is not present in the array
return -1
x = '0000000A0E3B9F25FF41DE4B5AC238C2D545C7A8:15'
def getLineFromFile(lineNumber):
with open('testfile.txt') as f:
for i, line in enumerate(f):
if i == lineNumber:
return line
else:
print('Not 7 lines in file')
line = None
# get last Element of List
def tail():
for line in open('pwned.txt', 'r'):
pass
else:
print line
ausgabetail = tail()
#print ausgabetail
result = binarySearch( 0, ausgabetail, x)
if result != -1:
print "Element is present at index % d" % result
else:
print "Element is not present in array"
我现在的问题是为二进制搜索的右侧获取正确的索引。我传递函数 (l, r, x)。左侧从 0 开始。右侧应该是文件的末尾,所以最后一行。我试图得到它,但它不起作用。我试图用 Funktion tail() 来获得它。但是如果我在测试时打印 r,我得到的值是“None”。 你有别的想法吗?
最佳答案
查看代码,我发现您仍在读取文件中的所有行,这确实是瓶颈。 这不是二进制搜索。 假设哈希已排序 您可以只读取文件中的行数。 然后只需执行二进制搜索。您可以使用 Seek 到达文件中的特定行,而不是读取整个文件,这样您将只读取 log(n) 行数。这应该会提高速度。
例子
def binarySearch(l, r, x):
....
#change arr[mid] with getLineFromFile(mid)
....
def getLineFromFile(lineNumber):
with open('xxx.txt') as f:
for i, line in enumerate(f):
if i == lineNumber:
return line
else:
print('Not 7 lines in file')
line = None
关于python - 两个哈希的比较算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57970171/