Python文件操作

假设我有这样的文件夹

  rootfolder
      | 
     / \ \
    01 02 03 ....
    |
  13_itemname.xml

所以在我的根文件夹下，每个目录代表一个月，例如 01 02 03，在这些目录下，我有项目及其创建时间和项目名称例如 16_item1.xml、24_item1.xml 等，您可能猜到有多个项目，每个 xml 每小时创建一次。

现在我想做两件事:

我需要生成一个月的项目名称列表，即对于 01，我在其中包含项目 1、项目 2 和项目 3。
我需要过滤每个项目，例如 item1:我想读取从 01_item1.xml 到 24_item1.xml 的每个项目。

如何在 Python 中实现这些以一种简单的方式？

最佳答案

这里有两种方法可以满足您的要求(如果我理解正确的话)。一个有正则表达式，一个没有。你选择你喜欢的那个;)

“setdefault”行可能看起来很神奇。有关解释，请参阅 the docs .我把它留作“给读者的练习”，以了解它是如何工作的；)

from os import listdir
from os.path import join

DATA_ROOT = "testdata"

def folder_items_no_regex(month_name):

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):
      date, name = file.split( "_", 1 )

      # skip files that were not possible to split on "_"
      if not date or not name:
         continue

      # ignore non-.xml files
      if not name.endswith(".xml"):
         continue

      # cut off the ".xml" extension
      name = name[0:-4]

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items

def folder_items_regex(month_name):

   import re

   # The pattern:
   # 1. match the beginnning of line "^"
   # 2. capture 1 or more digits ( \d+ )
   # 3. match the "_"
   # 4. capture any character (as few as possible ): (.*?)
   # 5. match ".xml"
   # 6. match the end of line "$"
   pattern = re.compile( r"^(\d+)_(.*?)\.xml$" )

   # dict holding the items (assuming ordering is irrelevant)
   items = {}

   # 1. Loop through all filenames in said folder
   for file in listdir( join( DATA_ROOT, month_name ) ):

      match = pattern.match( file )
      if not match:
         continue

      date, name = match.groups()

      # keep a list of filenames
      items.setdefault( name, set() ).add( file )

   return items
if __name__ == "__main__":
   from pprint import pprint

   data = folder_items_no_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )


   data = folder_items_regex( "02" )

   print "--- The dict ---------------"
   pprint( data )

   print "--- The items --------------"
   pprint( sorted( data.keys() ) )

   print "--- The files for item1 ---- "
   pprint( sorted( data["item1"] ) )

关于Python文件操作，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1699552/

上一篇：python - 如何在python中拆分匹配模式的字符串

下一篇：python - for循环中的正则表达式