假设我有这样的文件夹
rootfolder
|
/ \ \
01 02 03 ....
|
13_itemname.xml
所以在我的根文件夹下,每个目录代表一个月,例如 01 02 03,在这些目录下,我有项目及其创建时间和项目名称 例如 16_item1.xml、24_item1.xml 等,您可能猜到有多个项目,每个 xml 每小时创建一次。
现在我想做两件事:
我需要生成一个月的项目名称列表,即对于 01,我在其中包含项目 1、项目 2 和项目 3。
我需要过滤每个项目,例如 item1:我想读取从 01_item1.xml 到 24_item1.xml 的每个项目。
如何在 Python 中实现这些以一种简单的方式?
最佳答案
这里有两种方法可以满足您的要求(如果我理解正确的话)。一个有正则表达式,一个没有。你选择你喜欢的那个;)
“setdefault”行可能看起来很神奇。有关解释,请参阅 the docs .我把它留作“给读者的练习”,以了解它是如何工作的;)
from os import listdir
from os.path import join
DATA_ROOT = "testdata"
def folder_items_no_regex(month_name):
# dict holding the items (assuming ordering is irrelevant)
items = {}
# 1. Loop through all filenames in said folder
for file in listdir( join( DATA_ROOT, month_name ) ):
date, name = file.split( "_", 1 )
# skip files that were not possible to split on "_"
if not date or not name:
continue
# ignore non-.xml files
if not name.endswith(".xml"):
continue
# cut off the ".xml" extension
name = name[0:-4]
# keep a list of filenames
items.setdefault( name, set() ).add( file )
return items
def folder_items_regex(month_name):
import re
# The pattern:
# 1. match the beginnning of line "^"
# 2. capture 1 or more digits ( \d+ )
# 3. match the "_"
# 4. capture any character (as few as possible ): (.*?)
# 5. match ".xml"
# 6. match the end of line "$"
pattern = re.compile( r"^(\d+)_(.*?)\.xml$" )
# dict holding the items (assuming ordering is irrelevant)
items = {}
# 1. Loop through all filenames in said folder
for file in listdir( join( DATA_ROOT, month_name ) ):
match = pattern.match( file )
if not match:
continue
date, name = match.groups()
# keep a list of filenames
items.setdefault( name, set() ).add( file )
return items
if __name__ == "__main__":
from pprint import pprint
data = folder_items_no_regex( "02" )
print "--- The dict ---------------"
pprint( data )
print "--- The items --------------"
pprint( sorted( data.keys() ) )
print "--- The files for item1 ---- "
pprint( sorted( data["item1"] ) )
data = folder_items_regex( "02" )
print "--- The dict ---------------"
pprint( data )
print "--- The items --------------"
pprint( sorted( data.keys() ) )
print "--- The files for item1 ---- "
pprint( sorted( data["item1"] ) )
关于Python文件操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1699552/