python - 如何在Python中查找特定文件

我有一个包含以下结构的文件的目录

A2ML1_A8K2U0_MutationOutput.txt
A4GALT_Q9NPC4_MutationOutput.txt
A4GNT_Q9UNA3_MutationOutput.txt
...

前几个字母代表基因，接下来的几个字母代表 Uniprot 编号(唯一的蛋白质标识符)，MutationOutput 是不言自明的。

在Python中，我想执行以下行:

f_outputfile.write(mutation_directory + SOMETHING +line[1+i]+"_MutationOutput.txt\n")

此处，line[1+i] 正确标识了 Uniprot ID。

我需要做的是正确识别基因名称。因此，不知何故，我需要快速搜索该目录，找到其 uniprot 字段中具有 line[i+1] 值的文件，然后提取基因名称。

我知道我可以列出目录中的所有文件，然后我可以对每个字符串执行 str.split() 并找到它。但有没有一种方法可以让我更聪明地做到这一点？我应该使用字典吗？我可以快速进行正则表达式搜索吗？

整个目录大约有 8,116 个文件——所以不是那么多。

感谢您的帮助!

最佳答案

What I need to do is correctly identify the gene name. So somehow, I need to quickly search over that directory, find the file that has the line[i+1] value in it's uniprot field and then pull out the gene name.

考虑一下如何在 shell 中执行此操作:

$ ls mutation_directory/*_A8K2U0_MutationOutput.txt
mutation_directory/A2ML1_A8K2U0_MutationOutput.txt

或者，如果您使用的是 Windows:

D:\Somewhere> dir mutation_directory\*_A8K2U0_MutationOutput.txt
A2ML1_A8K2U0_MutationOutput.txt

您可以在 Python 中使用 glob 执行完全相同的操作模块:

>>> import glob
>>> glob.glob('mutation_directory/*_A8K2U0_MutationOutput.txt')
['mutation_directory/A2ML1_A8K2U0_MutationOutput.txt']

当然，您可以将其包装在一个函数中:

>>> def find_gene(uniprot):
...     pattern = 'mutation_directory/*_{}_MutationOutput.txt'.format(uniprot)
...     return glob.glob(pattern)[0]

But is there a way I can do that smarter? Should I use a dictionary?

这是否“更智能”取决于您的使用模式。

如果每次运行要查找数千个文件，那么仅读取一次目录并使用字典而不是重复搜索肯定会更有效。但是，如果您计划无论如何读取整个文件，那么这将比查找时间长几个数量级，所以这可能并不重要。而且您知道他们对过早优化的看法。

但是如果你愿意，你可以很容易地制作一个由 Uniprot 数字作为键的字典:

d = {}
for f in os.listdir('mutation_directory'):
    gene, uniprot, suffix = f.split('_')
    d[uniprot] = f

然后:

>>> d['A8K2U0']
'mutation_directory/A2ML1_A8K2U0_MutationOutput.txt'

Can I just do a quick regex search?

对于您的简单情况，您不需要正则表达式。*

更重要的是，您要搜索什么？要么你要循环 - 在这种情况下你最好使用 glob - 或者你必须建立一个人造的巨型字符串来搜索 - 在这种情况下你会更好只是构建字典。

_{* 事实上，至少在某些平台/实现上，glob 是通过从简单的通配符模式中创建正则表达式来实现的，但您不必担心这一点.}

关于python - 如何在Python中查找特定文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25696546/

python - 如何在Python中查找特定文件

上一篇：python - Python 中同一行上 'raw_input' 的倒计时循环

下一篇：python - 如何清除 Python threading.local 对象？