python - 计算在给定时间点打开的旧票

标签 python python-3.x pandas

背景:
我们有一个票务系统,每张票都有诸如开放日期、结束日期、类别、类型等字段。每张票在我的数据中由一行表示,并带有一个用于识别票证的键。

单个记录可能如下所示:

Number, Type, Category, Opened, Closed
TICKET100, Database, Software, 2/1/2020 11:30 AM, 4/22/2020 4:40 PM

目标:
我的目标是创建一个函数,该函数采用输入数据框(pandas)、某种类型的输入年龄参数和属性/维度列表。然后,该函数将根据每个属性/维度,按日期,根据该年龄参数,返回一个数据框,显示在某个年龄之后有多少票已打开。

输入示例:
ticket_age(input_dataframe, age=5, dimensions=['Type','Category'])

所需输出的示例片段:
Date, Type, Category, Count
3/1/2020, Database, Software, 1
3/2/2020, Database, Software, 1
...
4/22/2020, Database, Software, 0

关于输出的重要说明...如果日期和维度的交集没有任何满足条件的票证,它应该创建一个计数为 0 的行。

我被困的地方:
我不知道如何接受未知大小列表的维度并遍历所有这些维度。

我尝试了什么? 当我通过维度对循环进行硬编码时,我已经成功地生成了符合年龄标准的票数。

我如何计算 type_list、first_date 和 total_days:
    #Create function to find the minimum date
def date_minimum(input_dataframe, date_to_check):
    return input_dataframe[date_to_check].min().date()

#Create function to find the maximum date
def date_maximum(input_dataframe,date_to_check):
    return input_dataframe[date_to_check].max().date()

    #setup min and max dates
    min_date=date_minimum(df_aged_input,'Opened')
    max_date=date_maximum(df_aged_input,'Closed')
    #Get the first relevant date for the dataframe loop
    first_date=min_date+datetime.timedelta(days=aged_window)

    # Generate a list of unique assignment groups
    type_list=df_aged_input['Type'].unique().tolist()

我的循环
aged_output_list=[]
for type_iterate in range(len(type_list)):
    #filter by the type
    aged_type=type_list[type_iterate]
    df_aged_input=df_tkt_relevant[df_tkt_relevant['Type']==aged_type].copy()
    for date_iterate in range(totalDays.days):
        #generate the aged date iterator
        aged_date=first_date+datetime.timedelta(days=date_iterate)
        #Count the number of records in the data frame that match the input conditions
        aged_frame=df_aged_input[(~(df_aged_input['Closed'].dt.date<aged_date))&(df_aged_input['Opened'].dt.date<(aged_date-datetime.timedelta(days=aged_window+1)))].copy()
        aged_frame['aged_Date']=aged_date-datetime.timedelta(days=1)
        aged_count=aged_frame.shape[0]
        #Write the date from iterative date and the aged count to a new data source
        aged_output_list.append([aged_date,aged_type or 'Error: Missing Type',aged_count])

我接下来该怎么做?是否有另一个图书馆可以为我完成所有这些工作?

最佳答案

国际大学联盟

import pandas as pd
from pandas.tseries.offsets import DateOffset
from io import StringIO
import numpy as np

# sample data
s = """Number,Type,Category,Opened,Closed
TICKET100,Database,Software,2/1/2020 11:30 AM,2/22/2020 4:40 PM
TICKET101,Database,Software,2/10/2020 11:30 AM,2/23/2020 4:40 PM
TICKET102,Database,Software,2/11/2020 11:30 AM,2/22/2020 4:40 PM
TICKET103,something,else,2/10/2020 11:30 AM,2/23/2020 4:40 PM
TICKET104,something,else,2/12/2020 11:30 AM,2/22/2020 4:40 PM"""
df = pd.read_csv(StringIO(s))

# convert to datetime
df['Opened'] = pd.to_datetime(df['Opened'])
df['Closed'] = pd.to_datetime(df['Closed'])

# create a function
def myFunc(df, age, dimensions):
    # zip the date offset for open and close together
    z = zip(df['Opened'].dt.date + DateOffset(age), df['Closed'].dt.date - DateOffset(age))
    # create a daterange for each record in the dataframe
    df['date_range'] = [pd.date_range(o, c, freq='D') for o,c in z]
    # groupby your dimensions columns
    g = df.groupby(dimensions)
    # list comprehension for each group and create the date range
    idxs = [pd.date_range(df['Opened'].min(), df['Closed'].max(), freq='D',
                          normalize=True) for _,df in g]
    # sum all the true values where the datetime index create above
    # is in the date_range column
    l = [[sum(df[1]['date_range'].apply(lambda x: idx.isin(x))),
          idx, [df[0]]*len(idx)] for idx,df  in zip(idxs, g)]
    # transpose and hstack the list
    arr = np.transpose(np.hstack(l))
    # return pandas dataframe
    return pd.DataFrame(arr, columns=['count', 'date', 'dimensions'])

out = myFunc(df, 5, ['Type', 'Category'])

    count       date             dimensions
0       0 2020-02-01  (Database, Software)
1       0 2020-02-02  (Database, Software)
2       0 2020-02-03  (Database, Software)
3       0 2020-02-04  (Database, Software)
4       0 2020-02-05  (Database, Software)
5       1 2020-02-06  (Database, Software)
6       1 2020-02-07  (Database, Software)
7       1 2020-02-08  (Database, Software)
8       1 2020-02-09  (Database, Software)
9       1 2020-02-10  (Database, Software)
10      1 2020-02-11  (Database, Software)
11      1 2020-02-12  (Database, Software)
12      1 2020-02-13  (Database, Software)
13      1 2020-02-14  (Database, Software)
14      2 2020-02-15  (Database, Software)
15      3 2020-02-16  (Database, Software)
16      3 2020-02-17  (Database, Software)
17      1 2020-02-18  (Database, Software)
18      0 2020-02-19  (Database, Software)
19      0 2020-02-20  (Database, Software)
20      0 2020-02-21  (Database, Software)
21      0 2020-02-22  (Database, Software)
22      0 2020-02-23  (Database, Software)
23      0 2020-02-10     (something, else)
24      0 2020-02-11     (something, else)
25      0 2020-02-12     (something, else)
26      0 2020-02-13     (something, else)
27      0 2020-02-14     (something, else)
28      1 2020-02-15     (something, else)
29      1 2020-02-16     (something, else)
30      2 2020-02-17     (something, else)
31      1 2020-02-18     (something, else)
32      0 2020-02-19     (something, else)
33      0 2020-02-20     (something, else)
34      0 2020-02-21     (something, else)
35      0 2020-02-22     (something, else)
36      0 2020-02-23     (something, else)

关于python - 计算在给定时间点打开的旧票,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61397049/

相关文章:

python - 如何在 matplotlib 中绘制矩阵的特定部分?

python - scikit-learn:标记化时不要分隔带连字符的单词

python-3.x - 必须安装 python3-venv 包的理由

python - 从 pandas 数据框中的多个重复字符串创建列表

python - 是否有一种矢量化的方式来访问另一列中明确指示的列的值?

python - 无法从 keras.utils 导入 plot_model?

python - 通过管道将 OpenCV 和 PyAudio 从 python 传输到 ffmpeg 流 youtube rtmp

python - 使用索引列表从另一个列表中删除给我索引超出范围错误 - 为什么?

python - 在python中转换日期时间分钟和年份

python - 获取 DateTimeIndex 周期的平均值,然后将它们重新分配到原始数据框列