python - 如何使用 python pandas 比较 unicode 日期 u'2006-07-23' 格式和 25-06-15 08 :42:43. 830000000 PM?

标签 python python-2.7 pandas

基本上,unicode 格式将从日期选择器中获取,并且 25-06-15 08:42:43.830000000 PM 此格式来自一列 我的数据框是:

query,status,received_date
a,closed,25-06-15 08:42:43.830000000 PM
b,pending,27-06-15 08:42:43.830000000 PM
ab,closed,28-06-15 08:42:43.830000000 PM
bb,pending,29-06-15 08:42:43.830000000 PM

我将从日期选择器中获取两个日期,如下格式(u'2015-06-23',u'2015-06-29')。如何比较此 unicode 日期和 receive_date 列。

我必须显示这两个日期之间的数据(将从日期选择器中获取)

最佳答案

我认为您需要首先转换日期 to_datetime ,然后列 received_date 并提取 date 。最后使用boolean indexing使用 mask 进行过滤:

#datetimes changed for better testing
print df
  query   status                   received_date
0     a   closed  20-06-15 08:42:43.830000000 PM
1     b  pending  27-06-15 08:42:43.830000000 PM
2    ab   closed  28-06-15 08:42:43.830000000 PM
3    bb  pending  30-06-15 08:42:43.830000000 PM

dates = (u'2015-06-23',u'2015-06-29')
dates = pd.to_datetime(dates).date
print dates
[datetime.date(2015, 6, 23) datetime.date(2015, 6, 29)]

df['received_date'] = pd.to_datetime(df['received_date']).dt.date
print df
  query   status received_date
0     a   closed    2015-06-20
1     b  pending    2015-06-27
2    ab   closed    2015-06-28
3    bb  pending    2015-06-30

print (df['received_date'] > dates[0]) & (df['received_date'] < dates[1])
0    False
1     True
2     True
3    False
Name: received_date, dtype: bool

df = df[(df['received_date'] > dates[0]) & (df['received_date'] < dates[1])]
print df
  query   status received_date
1     b  pending    2015-06-27
2    ab   closed    2015-06-28

但是修改得更快PhilChang解决办法:

dates = (u'2015-06-23',u'2015-06-29')
df['received_date'] = pd.to_datetime(df['received_date'])
df = df.set_index('received_date')
return df[dates[0]:dates[1]]

测试(len(df) == 40k):

In [569]: %timeit a(df)
1 loops, best of 3: 12.2 s per loop

In [570]: %timeit b(df1)
10 loops, best of 3: 92.3 ms per loop

In [571]: %timeit c(df2)
100 loops, best of 3: 6.57 ms per loop

测试代码:

#length is 40k
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()

def a(df):
    dates = (u'2015-06-23',u'2015-06-29')
    df = df.set_index('received_date')
    df.index = pd.DatetimeIndex(df.index)
    return df[dates[0]:dates[1]]


def b(df):
    dates = (u'2015-06-23',u'2015-06-29')
    dates = pd.to_datetime(dates).date
    df['received_date'] = pd.to_datetime(df['received_date']).dt.date
    df = df[(df['received_date'] > dates[0]) & (df['received_date'] < dates[1])]
    return df

def c(df):
    dates = (u'2015-06-23',u'2015-06-29')
    df['received_date'] = pd.to_datetime(df['received_date'])
    df = df.set_index('received_date')
    return df[dates[0]:dates[1]]

print a(df)
print b(df1)
print c(df2)

关于python - 如何使用 python pandas 比较 unicode 日期 u'2006-07-23' 格式和 25-06-15 08 :42:43. 830000000 PM?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36733895/

相关文章:

python - 按下 ESCape 时如何退出 python2.x 脚本

python - 将大数据帧拆分为多个较小的数据帧

python - 在 python 中调试到库

python - Pandas:将 timedelta 列舍入为 15 秒

python - 如何在 Pandas 中将数据帧堆叠在一起

python - 请指导在推送到 Firebase 时循环通过 DataFrame

Python SkLearn : ValueError: Found input variables with inconsistent numbers samples: [1173, 294]

python - 使用 urllib 解析/拆分 pandas 数据框中的 URL

python - 为什么有的函数参数存放在栈上,有的存放在堆上?

python - 获取 numpy 的平方和立方并附加它