python - 使用 os.scandir 进行文件夹搜索

标签 python python-3.x

我编写了一个小程序,它基本上搜索网络驱动器中的一些 mat 文件。我使用的是Python3.6,所以我可以访问os.scandir()据说比 os.walk(). 更好的命令

但是我面临一个奇怪的问题,当我第一次运行程序时,需要很长时间才能获取数据。但是当我在几个小时后运行同一个程序时,它运行得非常快。

谁能解释一下这是为什么?以下是我的代码。

注意:我的网速非常好,因此网络驱动器的映射是无缝的。

class WorkThread(QObject):
    def scantree(self,path):
        try:
            for entry in scandir(path):
                if entry.is_dir(follow_symlinks=False):
                    yield from self.scantree(entry.path)  # see below for Python 2.x
                else:
                    yield entry
        except FileNotFoundError:
            print("Excluded file path")

    def searchFiles(self):
        start=time.time()
        ui.progressBar.setValue(0)
        usePATH='V:\Messdatenbank_Powertrain' # Location to the network drive
        os.chdir(usePATH)
        fileLevels = 0
        i=0
        k=0
        tableSize = ui.tableView.width()
        ui.tableView.setColumnWidth(4, int(tableSize/4) + 30 )
        ui.tableView.setColumnWidth(3, int(tableSize/4) + 300 )
        for entry in self.scantree(usePATH):
            if entry.name.endswith('COMPARE.mat') and 'MATLAB_NVH_TOOL' not in entry.path and 'old' not in entry.path and 'MESSDATENBANK' not in entry.path and 'old_' not in entry.path:
                ui.progressBar.setValue(0)
                i=i+1
                fileLevels=0# if 'COMPARE.mat' in f and not 'MIN' in f and not 'MAX' in f / if 'COMPARE.mat' in f )   # if 'COMPARE.mat' in f and not 'MIN' in f and not 'MAX' in f
                fileLevels=(entry.path.split('\\'))                            # Split path string at all '/'
                #print (fileLevels)
                t_row=[QtGui.QStandardItem(str(fileLevels[2])),QtGui.QStandardItem( str(fileLevels[3])),QtGui.QStandardItem(str(fileLevels[4])),QtGui.QStandardItem(str(fileLevels[len(fileLevels)-1])),QtGui.QStandardItem(str(entry.path))]
                ui.tableView.model().appendRow(t_row)
                ui.tableView.model().layoutChanged.emit()
                fileLevels.remove(fileLevels[len(fileLevels)-1])
                tmp_file_levels='\\'.join(fileLevels)
                ui.files.append(tmp_file_levels) # All files path stored here
                ui.file_loc_name.append(entry.path)
                ui.progressBar.setValue(50)
                # Implement try catch blocks
                if str(fileLevels[2]) not in ui.clusterlist:
                    ui.clusterlist.append(str(fileLevels[2]))
                if str(fileLevels[2]) not in ui.enginedict:
                    ui.enginedict[str(fileLevels[2])]=[str(fileLevels[3])]
                else:
                    if str(fileLevels[3]) not in ui.enginedict[str(fileLevels[2])]:
                        ui.enginedict[str(fileLevels[2])].append(str(fileLevels[3]))
                if str(fileLevels[3]) not in ui.measurementdict:
                    ui.measurementdict[str(fileLevels[3])]=[str(fileLevels[4])]
                else:
                    if str(fileLevels[4]) not in ui.measurementdict[str(fileLevels[3])]:
                        ui.measurementdict[str(fileLevels[3])].append(str(fileLevels[4]))                               
                ui.progressBar.setValue(100)
                QApplication.processEvents() 
            else:
                ui.label_7.setText(str(i))
                ui.tableView.model().layoutChanged.emit()
                ui.progressBar.setValue(0)
        end=time.time()
        print(end-start)
        ui.label_2.setText('Update Complete')
        ui.pushButton.setEnabled(False)
        print(str(len(ui.files)))
        ui.tableView.resizeColumnToContents (2)
        ui.comboBox.setEnabled(True)
        ui.label_7.setText(str(len(ui.files)))
        ui.comboBox.clear()
        ui.comboBox.addItems(["--Select Cluster--"])
        ui.comboBox.addItems(ui.clusterlist)
        ui.progressBar.setValue(100)
        QApplication.processEvents()
        ui.pushButton_2.setEnabled(True)
        ui.pushButton_24.setEnabled(True)

最佳答案

python.org PEP 471 -- os.scandir()描述了 os.scandir 的实现

os.scandir - This new function adds useful functionality and increases the speed of os.walk() by 2-20 times

第一次执行和下一次执行之间的差异是由于第一次执行期间缓存数据造成的。

Notes on caching

The DirEntry objects are relatively dumb -- the name and path attributes are obviously always cached, and the is_X and stat methods cache their values (immediately on Windows via FindNextFile , and on first use on POSIX systems via a stat system call) and never refetch from the system.

For this reason, DirEntry objects are intended to be used and thrown away after iteration, not stored in long-lived data structured and the methods called again and again.

If developers want "refresh" behaviour (for example, for watching a file's size change), they can simply use pathlib.Path objects, or call the regular os.stat() or os.path.getsize() functions which get fresh data from the operating system every call.

关于python - 使用 os.scandir 进行文件夹搜索,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44157216/

相关文章:

python - 是否有一种 Pythonic 方法可以按最大字节数截断 Unicode 字符串?

python - Django / python : raw sql with multiple tables

python - Tensorflow 没有分配完整的 GPU 内存

python - 模型 limit_choices_to= {'user' : user}

python - 是否有任何 Python3 兼容模块来读取/写入 IPTC 数据?

python - 基于一些数据使用节点和顶点构建图

python - 将从文件读取的 True/False 值转换为 boolean 值

python - 凯撒密码查询

python-3.x - 循环程序导致索引 # 超出轴 # 的范围

python - 如何从我收到的电子邮件中的超链接中提取 URL?