python - Bash别名自动检测任意命名的文件序列?

标签 python bash shell sequence ls

我正在寻找一个 bash 别名来改变 ls 的结果。我一直在处理不遵循相同命名约定的大量文件序列。它们唯一的共同点是数字是 4 填充的(抱歉,我不太确定正确的说法)并且紧接在扩展名之前。

例如 - filename_v028_0392.bgeo、test_x34.prerun.0012.simdata、filename_v001_0233.exr

我希望每个序列都列为 1 个元素,这样

filename_v003_0001.geo
filename_v003_0002.geo
filename_v003_0003.geo
filename_v003_0004.geo
filename_v003_0005.geo
filename_v003_0006.geo
filename_v003_0007.geo
filename_v003_0032.geo
filename_v003_0033.geo
filename_v003_0034.geo
filename_v003_0035.geo
filename_v003_0036.geo
testxxtest.0057.exr
testxxtest.0058.exr
testxxtest.0059.exr
testxxtest.0060.exr
testxxtest.0061.exr
testxxtest.0062.exr
testxxtest.0063.exr

将显示为 somethign

[seq]filename_v003_####.geo (1-7)
[seq]filename_v003_####.geo (32-36)
[seq]testxxtest.####.exr (57-63)

同时仍然列出未改变的非序列。

我真的不确定从哪里开始着手解决这个问题。我知道相当数量的 python,但不确定这是否真的是最好的方法。任何帮助将不胜感激!

谢谢

最佳答案

我得到了一个 python 2.7 脚本,它通过解决折叠仅由序列号更改的多行的更一般问题来解决您的问题

import re

def do_compress(old_ints, ints):
    """
    whether the ints of the current entry is the continuation of the previous
    entry
    returns a list of the indexes to compress, or [] or False when the current
    line is not part of an indexed sequence
    """
    return len(old_ints) == len(ints) and \
        [i for o, n, i in zip(old_ints, ints, xrange(len(ints))) if n - o == 1]

def basic_format(file_start, file_stop):
    return "[seq]{} .. {}".format(file_start, file_stop)


def compress(files, do_compress=do_compress, seq_format=basic_format):
    p = None
    old_ints = ()
    old_indexes = ()

    seq_and_files_list = [] 
        # list of file names or dictionaries that represent sequences:
        #   {start, stop, start_f, stop_f}

    for f in files:
        ints = ()
        indexes = ()

        m = p is not None and p.match(f) # False, None, or a valid match
        if m:
            ints = [int(x) for x in m.groups()]
            indexes = do_compress(old_ints, ints)

        # state variations
        if not indexes: # end of sequence or no current sequence
            p = re.compile( \
                '(\d+)'.join(re.escape(x) for x in re.split('\d+',f)) + '$')
            m = p.match(f)
            old_ints = [int(x) for x in m.groups()]
            old_indexes = ()
            seq_and_files_list.append(f)

        elif indexes == old_indexes: # the sequence continues
            seq_and_files_list[-1]['stop'] = old_ints = ints
            seq_and_files_list[-1]['stop_f'] = f
            old_indexes = indexes

        elif old_indexes == (): # sequence started on previous filename
            start_f = seq_and_files_list.pop()
            s = {'start': old_ints, 'stop': ints, \
                'start_f': start_f, 'stop_f': f}
            seq_and_files_list.append(s)

            old_ints = ints
            old_indexes = indexes

        else: # end of sequence, but still matches previous pattern
            old_ints = ints
            old_indexes = ()
            seq_and_files_list.append(f)

    return [ isinstance(f, dict) and seq_format(f['start_f'], f['stop_f']) or f 
        for f in seq_and_files_list ]


if __name__ == "__main__":
    import sys
    if len(sys.argv) == 1:
        import os
        lst = sorted(os.listdir('.'))
    elif sys.argv[1] in ("-h", "--help"):
        print """USAGE: {} [FILE ...]
compress the listing of the current directory, or the content of the files by
collapsing identical lines, except for a sequence number
"""
        sys.exit(0)
    else:
        import string
        lst = [string.rstrip(l, '\r\n') for f in sys.argv[1:] for l in open(f)])
    for x in compress(lst):
        print x

也就是说,在您的数据上:

bernard $ ./ls_sequence_compression.py given_data
[seq]filename_v003_0001.geo .. filename_v003_0007.geo
[seq]filename_v003_0032.geo .. filename_v003_0036.geo
[seq]testxxtest.0057.exr .. testxxtest.0063.exr

它基于匹配非数字文本的两个连续行中出现的整数之间的差异。这允许处理非均匀输入,在用作序列基础的字段的变化...

这是一个输入示例:

01 - test8.txt
01 - test9.txt
01 - test10.txt
02 - test11.txt
02 - test12.txt
03 - test13.txt
04 - test13.txt
05 - test13.txt
06
07
08
09
10

给出:

[seq]01 - test8.txt .. 01 - test10.txt
[seq]02 - test11.txt .. 02 - test12.txt
[seq]03 - test13.txt .. 05 - test13.txt
[seq]06 .. 10

欢迎任何评论!

哈...我差点忘了:没有参数,这个脚本输出当前目录的折叠内容。

关于python - Bash别名自动检测任意命名的文件序列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12874986/

相关文章:

python - Django-从子查询中注释多个字段

python - 属性错误人员和客户类别

linux - 替换特定字符串,并复制其下面的文本 block

linux - 我的 sed 命令是否正确解析来自 JSON 响应的特定键的值

linux - sshpass 中的多个密码

python - Pandas:扩展系列的索引,使其包含范围内的所有值

python - Mosaic `st_buffer` 不返回点或多边形类型的几何图形

bash - 验证 bash 脚本参数

Bash:找不到命令

regex - Linux命令sed中的正则表达式