下面是我几个月前一直在编写的代码片段,但直到现在才需要它。我相信它的主要部分是我从 SO 帖子中修改的一些代码,但我丢失了 URL。不管怎样,我已经忘记了当涉及数十万个文件时它有多慢,所以我正在研究让它更快的方法。
我尝试移动部分代码并省略某些部分,但性能要么保持不变,要么变得更糟,这让我相信问题出在 os.listdir 命令中。据我所知,os.listdir 是这里最快的选项,因为它执行的系统调用不如 scandir 或 walk 那么多,但对于超过 100000 个文件的文件夹,其性能仍然很差,如下所述。
14387 files in 2794 folders processed in 5.88s
14387 files in 2794 folders processed in 3.224s
14387 files in 2794 folders processed in 5.847s
110016 files in 21440 folders processed in 22.732s
110016 files in 21440 folders processed in 22.603s
110016 files in 21440 folders processed in 41.055s
249714 files in 35707 folders processed in 66.452s
249714 files in 35707 folders processed in 49.154s
249714 files in 35707 folders processed in 88.43s
249714 files in 35707 folders processed in 48.942s
我目前正在研究使用静态文本文件索引文件/文件夹位置的另一种方法,该文件每小时都会在服务器上预先填充最新的文件夹内容,但在放弃下面的代码之前,我想寻求帮助以了解代码是否可以变得更快或者是否达到了极限。
import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import time
time_start = time.time()
iid = 1 # IID of tree item. 0 is top level parent
count_folders = 0 # Number of folders in parent
count_files = 0 # Number of files in parent
compare_check = {} # Build the dictionary with IID key and folder/file paths in list
root = tk.Tk()
root.geometry('850x450')
style = ttk.Style(root)
v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)
def new_folder(parent_path, directory_entries, parent_iid):
global iid, count_folders, count_files
for name in directory_entries:
item_path = parent_path + os.sep + name
if os.path.isdir(item_path):
subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
try:
subdir_entries = os.listdir(item_path)
new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
count_folders += 1 # for testing
except PermissionError:
pass
else:
tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
count_files += 1 # for testing
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
start_path = os.path.expanduser(r"K:/DMC Processed - 02072017") # Path for test
start_dir_entries = os.listdir(start_path)
new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)
time_end = time.time()
time_total = round(time_end - time_start, 3) # for testing. Simple start to end timer result
ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0) # for testing
print(f"{count_files} files in {count_folders} folders processed in {time_total}s") # for testing
root.mainloop()
最佳答案
既然你很好地设置了时间,我认为尝试一下会是一个有趣的挑战。
我尝试重写它以使用 os.walk,但我想到你的 os.path.isdir()
通话速度会非常慢,所以我用 scandir
将其关闭。 。事实证明这是我能找到的最快的方法。
基准:
original: 697665 files in 76729 folders processed in 106.079s
os.scandir: 697665 files in 76729 folders processed in 23.152s
os.walk: 697665 files in 76731 folders processed in 32.869s
使用scandir
module似乎没有太大区别,看来Python已经优化了os
现在很好。
这是包含其他函数的代码:
import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import scandir
import time
time_start = time.time()
iid = 1 # IID of tree item. 0 is top level parent
count_folders = 0 # Number of folders in parent
count_files = 0 # Number of files in parent
compare_check = {} # Build the dictionary with IID key and folder/file paths in list
root = tk.Tk()
root.geometry('850x450')
style = ttk.Style(root)
v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)
def new_folder(parent_path, directory_entries, parent_iid):
global iid, count_folders, count_files
for name in directory_entries:
item_path = parent_path + os.sep + name
if os.path.isdir(item_path):
subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
try:
subdir_entries = os.listdir(item_path)
new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
count_folders += 1 # for testing
except PermissionError:
pass
else:
tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
count_files += 1 # for testing
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
def new_folder_scandir(parent_path, parent_iid):
global iid, count_folders, count_files
for name in os.scandir(parent_path):
if name.is_dir():
subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
try:
new_folder_scandir(parent_path=name.path, parent_iid=subdir_iid)
count_folders += 1 # for testing
except PermissionError:
pass
else:
tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
count_files += 1 # for testing
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
def new_folder_walk(path):
global count_folders, count_files
def hex_thing(parent_path, name):
global iid
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
tree_items = {path: tree.insert(parent='', index='0', text='All Documents', open=True)}
for root, dirs, files in scandir.walk(path):
for dir in dirs:
path = os.path.join(root, dir)
count_folders += 1
tree_items[path] = tree.insert(parent=tree_items[root], index='end', text=f'[F] {dir}')
hex_thing(root, dir)
for file in files:
path = os.path.join(root, file)
count_files += 1
tree.insert(parent=tree_items[root], index='end', text=f'[f] {file}')
hex_thing(root, file)
start_path = os.path.expanduser(r"C:/Program Files") # Path for test
# 0 = original, 1 = scandir, 2 = walk
run = 1
if run == 0:
parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
start_dir_entries = os.listdir(start_path)
new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)
elif run == 1:
parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
new_folder_scandir(parent_path=start_path, parent_iid=parent_iid)
elif run == 2:
new_folder_walk(start_path)
time_end = time.time()
time_total = round(time_end - time_start, 3) # for testing. Simple start to end timer result
ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0) # for testing
print(f"{count_files} files in {count_folders} folders processed in {time_total}s") # for testing
root.mainloop()
郑重声明,我实际上很惊讶 os.walk
比 os.scandir
慢即使遍历每个文件也是如此。
关于python - 创建超过 100000 个项目的 TreeView 时代码速度很慢,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/73204781/