python - 给定索引开始分割Python列表

我看过这个:Split list into sublist based on index ranges

但我的问题略有不同。我有一个 list

List = ['2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report']

我需要根据日期将其分成子列表。基本上它是一个事件日志，但由于糟糕的数据库设计，系统将事件的单独更新消息连接到一个大的字符串列表中。我有:

Event_indices = [i for i, word in enumerate(List) if 
                 re.match(date_regex_return_all = "(\d+\-\d+\-\d+",word)]

对于我的示例将给出:

[0,3]

现在我需要根据索引将列表拆分为单独的列表。因此，对于我的示例，理想情况下我希望得到:

[List[0], [List[1], List[2]]], [List[3], [List[4],  List[5], List[6]] ]

所以格式是:

[event_date, [list of other text]], [event_date, [list of other text]]

还有一些边缘情况，没有日期字符串，其格式为:

Special_case = ['blah', 'blah', 'stuff']
Special_case_2 = ['blah', 'blah', '2015-01-01', 'blah', 'blah']

result_special_case = ['', [Special_case[0], Special_case[1],Special_case[2] ]]
result_special_case_2 = [ ['', [ Special_case_2[0], Special_case_2[1] ] ], 
                          [Special_case_2[2], [ Special_case_2[3],Special_case_2[4] ] ] ]

最佳答案

您根本不需要执行两遍分组，因为您可以使用 itertools.groupby一次即可按日期及其相关事件进行分段。通过避免计算索引然后使用它们对列表进行切片，您可以处理一个一次提供一个值的生成器，从而在输入很大时避免内存问题。为了进行演示，我采用了您的原始 List 并将其扩展了一下，以显示它可以正确处理边缘情况:

import re

from itertools import groupby

List = ['undated', 'garbage', 'then', 'twodates', '2015-12-31',
        '2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report',
        '2016-01-03']

datere = re.compile(r"\d+\-\d+\-\d+")  # Precompile regex for speed
def group_by_date(it):
    # Make iterator that groups dates with dates and non-dates with dates
    grouped = groupby(it, key=lambda x: datere.match(x) is not None)
    for isdate, g in grouped:
        if not isdate:
            # We had a leading set of undated events, output as undated
            yield ['', list(g)]
        else:
            # At least one date found; iterate with one loop delay
            # so final date can have events included (all others have no events)
            lastdate = next(g)
            for date in g:
                yield [lastdate, []]
                lastdate = date

            # Final date pulls next group (which must be events or the end of the input)
            try:
                # Get next group of events
                events = list(next(grouped)[1])
            except StopIteration:
                # There were no events for final date
                yield [lastdate, []]
            else:
                # There were events associated with final date
                yield [lastdate, events]

print(list(group_by_date(List)))

输出(为了可读性而添加换行符):

[['', ['undated', 'garbage', 'then', 'twodates']],
 ['2015-12-31', []],
 ['2016-01-01', ['stuff happened', 'details']],
 ['2016-01-02', ['more stuff happened', 'details', 'report']],
 ['2016-01-03', []]]

关于python - 给定索引开始分割Python列表，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39782958/

python - 给定索引开始分割Python列表

上一篇：python - 同一命名空间中的包: can't import module in setup script

下一篇：python - 按数据框中的列分组并为每个组创建单独的 csv