我看过这个:Split list into sublist based on index ranges
但我的问题略有不同。 我有一个 list
List = ['2016-01-01', 'stuff happened', 'details',
'2016-01-02', 'more stuff happened', 'details', 'report']
我需要根据日期将其分成子列表。基本上它是一个事件日志,但由于糟糕的数据库设计,系统将事件的单独更新消息连接到一个大的字符串列表中。 我有:
Event_indices = [i for i, word in enumerate(List) if
re.match(date_regex_return_all = "(\d+\-\d+\-\d+",word)]
对于我的示例将给出:
[0,3]
现在我需要根据索引将列表拆分为单独的列表。因此,对于我的示例,理想情况下我希望得到:
[List[0], [List[1], List[2]]], [List[3], [List[4], List[5], List[6]] ]
所以格式是:
[event_date, [list of other text]], [event_date, [list of other text]]
还有一些边缘情况,没有日期字符串,其格式为:
Special_case = ['blah', 'blah', 'stuff']
Special_case_2 = ['blah', 'blah', '2015-01-01', 'blah', 'blah']
result_special_case = ['', [Special_case[0], Special_case[1],Special_case[2] ]]
result_special_case_2 = [ ['', [ Special_case_2[0], Special_case_2[1] ] ],
[Special_case_2[2], [ Special_case_2[3],Special_case_2[4] ] ] ]
最佳答案
您根本不需要执行两遍分组,因为您可以使用 itertools.groupby
一次即可按日期及其相关事件进行分段。通过避免计算索引然后使用它们对列表
进行切片,您可以处理一个一次提供一个值的生成器,从而在输入很大时避免内存问题。为了进行演示,我采用了您的原始 List
并将其扩展了一下,以显示它可以正确处理边缘情况:
import re
from itertools import groupby
List = ['undated', 'garbage', 'then', 'twodates', '2015-12-31',
'2016-01-01', 'stuff happened', 'details',
'2016-01-02', 'more stuff happened', 'details', 'report',
'2016-01-03']
datere = re.compile(r"\d+\-\d+\-\d+") # Precompile regex for speed
def group_by_date(it):
# Make iterator that groups dates with dates and non-dates with dates
grouped = groupby(it, key=lambda x: datere.match(x) is not None)
for isdate, g in grouped:
if not isdate:
# We had a leading set of undated events, output as undated
yield ['', list(g)]
else:
# At least one date found; iterate with one loop delay
# so final date can have events included (all others have no events)
lastdate = next(g)
for date in g:
yield [lastdate, []]
lastdate = date
# Final date pulls next group (which must be events or the end of the input)
try:
# Get next group of events
events = list(next(grouped)[1])
except StopIteration:
# There were no events for final date
yield [lastdate, []]
else:
# There were events associated with final date
yield [lastdate, events]
print(list(group_by_date(List)))
输出(为了可读性而添加换行符):
[['', ['undated', 'garbage', 'then', 'twodates']],
['2015-12-31', []],
['2016-01-01', ['stuff happened', 'details']],
['2016-01-02', ['more stuff happened', 'details', 'report']],
['2016-01-03', []]]
关于python - 给定索引开始分割Python列表,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39782958/