python - 给定索引开始分割Python列表

标签 python string list

我看过这个:Split list into sublist based on index ranges

但我的问题略有不同。 我有一个 list

List = ['2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report']

我需要根据日期将其分成子列表。基本上它是一个事件日志,但由于糟糕的数据库设计,系统将事件的单独更新消息连接到一个大的字符串列表中。 我有:

Event_indices = [i for i, word in enumerate(List) if 
                 re.match(date_regex_return_all = "(\d+\-\d+\-\d+",word)]

对于我的示例将给出:

[0,3]

现在我需要根据索引将列表拆分为单独的列表。因此,对于我的示例,理想情况下我希望得到:

[List[0], [List[1], List[2]]], [List[3], [List[4],  List[5], List[6]] ]

所以格式是:

[event_date, [list of other text]], [event_date, [list of other text]]

还有一些边缘情况,没有日期字符串,其格式为:

Special_case = ['blah', 'blah', 'stuff']
Special_case_2 = ['blah', 'blah', '2015-01-01', 'blah', 'blah']

result_special_case = ['', [Special_case[0], Special_case[1],Special_case[2] ]]
result_special_case_2 = [ ['', [ Special_case_2[0], Special_case_2[1] ] ], 
                          [Special_case_2[2], [ Special_case_2[3],Special_case_2[4] ] ] ]

最佳答案

您根本不需要执行两遍分组,因为您可以使用 itertools.groupby一次即可按日期及其相关事件进行分段。通过避免计算索引然后使用它们对列表进行切片,您可以处理一个一次提供一个值的生成器,从而在输入很大时避免内存问题。为了进行演示,我采用了您的原始 List 并将其扩展了一下,以显示它可以正确处理边缘情况:

import re

from itertools import groupby

List = ['undated', 'garbage', 'then', 'twodates', '2015-12-31',
        '2016-01-01', 'stuff happened', 'details', 
        '2016-01-02', 'more stuff happened', 'details', 'report',
        '2016-01-03']

datere = re.compile(r"\d+\-\d+\-\d+")  # Precompile regex for speed
def group_by_date(it):
    # Make iterator that groups dates with dates and non-dates with dates
    grouped = groupby(it, key=lambda x: datere.match(x) is not None)
    for isdate, g in grouped:
        if not isdate:
            # We had a leading set of undated events, output as undated
            yield ['', list(g)]
        else:
            # At least one date found; iterate with one loop delay
            # so final date can have events included (all others have no events)
            lastdate = next(g)
            for date in g:
                yield [lastdate, []]
                lastdate = date

            # Final date pulls next group (which must be events or the end of the input)
            try:
                # Get next group of events
                events = list(next(grouped)[1])
            except StopIteration:
                # There were no events for final date
                yield [lastdate, []]
            else:
                # There were events associated with final date
                yield [lastdate, events]

print(list(group_by_date(List)))

输出(为了可读性而添加换行符):

[['', ['undated', 'garbage', 'then', 'twodates']],
 ['2015-12-31', []],
 ['2016-01-01', ['stuff happened', 'details']],
 ['2016-01-02', ['more stuff happened', 'details', 'report']],
 ['2016-01-03', []]]

关于python - 给定索引开始分割Python列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39782958/

相关文章:

python - Python-OpenCv相机校准-将 map 矩阵保存到文件并读回

python - 如何计算 csv 文件中字符串的最小值?

python - 相对导入和单元测试

c# - 如何将整数日期转换为格式化日期字符串(即 2012009 到 2/01/2009)

java - 如何获取字符串并返回单词的大写格式

java - Java中的排序列表<>

python - 有没有更好的方法将文字与背景分开?

java - 如何将对象的名称保存在另一个变量中?

java - List包含Map需要以一对多关系交换值

python - 如何在 Python 中将列表项分组为顺序元组?