我是一名业余编码员,从 AHK 开始,然后学习 Java,现在我尝试学习 Python。 我已经搜索并找到了一些技巧,但我还无法将其实现到我自己的代码中。 希望这里有人可以帮助我,这是一个非常短的程序。 我正在使用带有“;”的 .txt csv 数据库作为分隔符。 数据库示例:
猫通常是什么颜色?;黑色
地球上最长的人有多高?;272 厘米
地球是圆的吗?;是的
数据库现在由 20.000 行组成,这使得程序“变慢”,仅使用 25% CPU(1 核)。
如果我能让它使用全部 4 个核心 (100%),我想它会更快地执行任务。任务基本上是将剪贴板与数据库进行比较,如果有匹配,它应该给我一个答案作为返回。也许我也可以将数据库分成 4 个部分?
现在的代码看起来像这样!不超过 65 行,它就完成了它的工作(但速度很慢)。需要有关如何使这个过程成为多核的建议。
import time
import pyperclip as pp
import pandas as pd
import pymsgbox as pmb
from fuzzywuzzy import fuzz
import numpy
ratio_threshold = 90
fall_back_time = 1
db_file_path = 'database.txt'
db_separator = ';'
db_encoding = 'latin-1'
def load_db():
while True:
try:
# Read and create database
db = pd.read_csv(db_file_path, sep=db_separator, encoding=db_encoding)
db = db.drop_duplicates()
return db
except:
print("Error in load_db(). Will sleep for %i seconds..." % fall_back_time)
time.sleep(fall_back_time)
def top_answers(db, question):
db['ratio'] = db['question'].apply(lambda q: fuzz.ratio(q, question))
db_sorted = db.sort_values(by='ratio', ascending=False)
db_sorted = db_sorted[db_sorted['ratio'] >= ratio_threshold]
return db_sorted
def write_txt(top):
result = top.apply(lambda row: "%s" % (row['answer']), axis=1).tolist()
result = '\n'.join(result)
fileHandle = open("svar.txt", "w")
fileHandle.write(result)
fileHandle.close()
pp.copy("")
def main():
try:
db = load_db()
last_db_reload = time.time()
while True:
# Get contents of clipboard
question = pp.paste()
# Rank answer
top = top_answers(db, question)
# If answer was found, show results
if len(top) > 0:
write_txt(top)
time.sleep(fall_back_time)
except:
print("Error in main(). Will sleep for %i seconds..." % fall_back_time)
time.sleep(fall_back_time)
if name == 'main':
main()'
最佳答案
如果您可以将数据库分成四个同样大的数据库,您可以像这样并行处理它们:
import time
import pyperclip as pp
import pandas as pd
import pymsgbox as pmb
from fuzzywuzzy import fuzz
import numpy
import threading
ratio_threshold = 90
fall_back_time = 1
db_file_path = 'database.txt'
db_separator = ';'
db_encoding = 'latin-1'
def worker(thread_id, question):
thread_id = str(thread_id)
db = pd.read_csv(db_file_path + thread_id, sep=db_separator, encoding=db_encoding)
db = db.drop_duplicates()
db['ratio'] = db['question'].apply(lambda q: fuzz.ratio(q, question))
db_sorted = db.sort_values(by='ratio', ascending=False)
db_sorted = db_sorted[db_sorted['ratio'] >= ratio_threshold]
top = db_sorted
result = top.apply(lambda row: "%s" % (row['answer']), axis=1).tolist()
result = '\n'.join(result)
fileHandle = open("svar" + thread_id + ".txt", "w")
fileHandle.write(result)
fileHandle.close()
pp.copy("")
return
def main():
question = pp.paste()
for i in range(1, 4):
t = threading.Thread(target=worker, args=(i, question))
t.start()
t.join()
if name == 'main':
main()
关于Python 多核 CSV 短程序,需要建议/帮助,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52740002/