我正在尝试编写一个函数来识别日期组并测量该组的大小。
该函数将获取已按日期顺序排序的元素列表(这些元素是包含日期的 CSV 文件中的单独行)。该列表的长度可能是 0 到 n 个元素。我希望按照输入的方式编写列表,并添加日期组的大小。
例如列表
Bill 01/01/2011
Bill 02/01/2011
Bill 03/01/2011
Bill 05/01/2011
Bill 07/01/2011
应输出(最好打印到文件中)为
Bill 01/01/2011 3
Bill 02/01/2011 3
Bill 03/01/2011 3
Bill 05/01/2011 1
Bill 07/01/2011 1.
我已经有一个名为 isBeside(string1, string2)
的函数,它返回两者之间的增量。
到目前为止我的尝试是这样的(迭代困惑,我确信 python 可以比这更优雅)
注意 coll[i][1]
包含 CSV 行的日期元素。
def printSet(coll):
setSize = len(coll)
if setSize == 0:
#dont need to do anything
elif setSize == 1:
for i in coll:
print i, 1
elif setSize > 1:
printBuffer = [] ##new buffer list which will hold sequential dates,
until a non-sequential one is found
printBuffer.append(coll[0]) #add the first item
print 'Adding ' + str(coll[0])
for i in range(0, len(coll)-1):
print 'Comparing ', coll[i][1], coll[i+1][1], isBeside(coll[i][1], coll[i+1][1])
if isBeside(coll[i][1], coll[i+1][1]) == 1:
printBuffer.append(coll[i+1])
print 'Adding ' + str(coll[i+1])
else:
for j in printBuffer:
print j, len(printBuffer)
printBuffer = []
printBuffer.append(coll[i])
return
最佳答案
类似这样的吗?
from datetime import date, timedelta
coll = [['Bill', date(2011,1,1)],
['Bill', date(2011,1,2)],
['Bill', date(2011,1,3)],
['Bill', date(2011,1,5)],
['Bill', date(2011,1,7)]]
res = []
group = [coll[0]]
i = 1
while i < len(coll):
row = coll[i]
last_in_group = group[-1]
# use your isBeside() function here...
if row[1] - last_in_group[1] == timedelta(days=1):
# consecutive, append to current group..
group.append(row)
else:
# not consecutive, start new group.
res.append(group)
group = [row]
i += 1
res.append(group)
for group in res:
for row in group:
for item in row:
print item,
print len(group)
它打印:
Bill 2011-01-01 3
Bill 2011-01-02 3
Bill 2011-01-03 3
Bill 2011-01-05 1
Bill 2011-01-07 1
关于python - 日期比较/连续日期分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/10127666/