python - 当我写入 CSV 时,如何隐藏 pandas to_datetime NaT?

标签 python date pandas

我有点困惑为什么 NaT 会出现在我的 CSV 中……通常它们显示为“”。这是我的日期格式:

df['submitted_on'] = pd.to_datetime(df['submitted_on'], errors='coerce').dt.to_period('d')
df['resolved_on'] = pd.to_datetime(df['resolved_on'], errors='coerce').dt.to_period('d')
df['closed_on'] = pd.to_datetime(df['closed_on'], errors='coerce').dt.to_period('d')
df['duplicate_on'] = pd.to_datetime(df['duplicate_on'], errors='coerce').dt.to_period('d')
df['junked_on'] = pd.to_datetime(df['junked_on'], errors='coerce').dt.to_period('d')
df['unproducible_on'] = pd.to_datetime(df['unproducible_on'], errors='coerce').dt.to_period('d')
df['verified_on'] = pd.to_datetime(df['verified_on'], errors='coerce').dt.to_period('d')

当我使用 df.head() 时,这是我的结果。很好,很好,一切都很好。

  identifier status submitted_on resolved_on closed_on duplicate_on junked_on  \
0        xx1      D   2004-07-28         NaT       NaT   2004-08-26       NaT   
1        xx2      N   2010-03-02         NaT       NaT          NaT       NaT   
2        xx3      U   2005-10-26         NaT       NaT          NaT       NaT   
3        xx4      V   2006-06-30  2006-09-15       NaT          NaT       NaT   
4        xx5      R   2012-09-21  2013-06-06       NaT          NaT       NaT   

  unproducible_on verified_on  
0             NaT         NaT  
1             NaT         NaT  
2      2005-11-01         NaT  
3             NaT  2006-11-20  
4             NaT         NaT  

但是我写到 CSV 和 NaT 出现了:

"identifier","status","submitted_on","resolved_on","closed_on","duplicate_on","junked_on","unproducible_on","verified_on"
"xx1","D","2004-07-28","NaT","NaT","2004-08-26","NaT","NaT","NaT"
"xx2","N","2010-03-02","NaT","NaT","NaT","NaT","NaT","NaT"
"xx3","U","2005-10-26","NaT","NaT","NaT","NaT","2005-11-01","NaT"
"xx4","V","2006-06-30","2006-09-15","NaT","NaT","NaT","NaT","2006-11-20"
"xx5","R","2012-09-21","2013-06-06","NaT","NaT","NaT","NaT","NaT"
"xx6","D","2009-11-25","NaT","NaT","2010-02-26","NaT","NaT","NaT"
"xx7","D","2003-08-29","NaT","NaT","2003-08-29","NaT","NaT","NaT"
"xx8","R","2003-06-06","2003-06-24","NaT","NaT","NaT","NaT","NaT"
"xx9","R","2004-11-05","2004-11-15","NaT","NaT","NaT","NaT","NaT"
"xx10","R","2008-02-21","2008-09-25","NaT","NaT","NaT","NaT","NaT"
"xx11","R","2007-03-08","2007-03-21","NaT","NaT","NaT","NaT","NaT"
"xx12","R","2011-08-22","2012-06-21","NaT","NaT","NaT","NaT","NaT"
"xx13","J","2003-07-07","NaT","NaT","NaT","2003-07-10","NaT","NaT"
"xx14","A","2008-09-24","NaT","NaT","NaT","NaT","NaT","NaT"

所以,我做了我认为可以解决问题的事情。 df.fillna('', inplace=True) 和 nada。然后我尝试了 df.replace(pd.NaT, '') 没有结果,然后是 na_rep='' 当我写到 CSV 时也没有得到想要的结果输出。我应该使用什么来防止 NaT 被转录成 CSV?

示例数据:

"identifier","status","submitted_on","resolved_on","closed_on","duplicate_on","junked_on","unproducible_on","verified_on"
"xx1","D","2004-07-28 07:00:00.0","null","null","2004-08-26 07:00:00.0","null","null","null"
"xx2","N","2010-03-02 03:00:16.0","null","null","null","null","null","null"
"xx3","U","2005-10-26 14:20:20.0","null","null","null","null","2005-11-01 13:02:22.0","null"
"xx4","V","2006-06-30 07:00:00.0","2006-09-15 07:00:00.0","null","null","null","null","2006-11-20 08:00:00.0"
"xx5","R","2012-09-21 06:30:58.0","2013-06-06 09:35:25.0","null","null","null","null","null"
"xx6","D","2009-11-25 02:16:03.0","null","null","2010-02-26 12:28:22.0","null","null","null"
"xx7","D","2003-08-29 07:00:00.0","null","null","2003-08-29 07:00:00.0","null","null","null"
"xx8","R","2003-06-06 12:00:00.0","2003-06-24 12:00:00.0","null","null","null","null","null"
"xx9","R","2004-11-05 08:00:00.0","2004-11-15 08:00:00.0","null","null","null","null","null"
"xx10","R","2008-02-21 05:13:39.0","2008-09-25 17:20:57.0","null","null","null","null","null"
"xx11","R","2007-03-08 17:47:44.0","2007-03-21 23:47:57.0","null","null","null","null","null"
"xx12","R","2011-08-22 19:50:25.0","2012-06-21 05:52:12.0","null","null","null","null","null"
"xx13","J","2003-07-07 12:00:00.0","null","null","null","2003-07-10 12:00:00.0","null","null"
"xx14","A","2008-09-24 11:36:34.0","null","null","null","null","null","null"

最佳答案

您的问题在于您正在转换为句点。您看到的 NaT 实际上是一个 period 对象。

解决此问题的一种方法是改为转换为字符串。

使用

.dt.strftime('%Y-%m-%d')

代替

.dt.to_period('d')

那么你看到的NaT就是字符串,可以像这样替换

.dt.strftime('%Y-%m-%d').replace('NaT', '')

df = pd.DataFrame(dict(date=pd.to_datetime(['2015-01-01', pd.NaT])))
df

enter image description here

df.date.dt.strftime('%Y-%m-%d')

0    2015-01-01
1           NaT
Name: date, dtype: object

df.date.dt.strftime('%Y-%m-%d').replace('NaT', '')

0    2015-01-01
1              
Name: date, dtype: object

关于python - 当我写入 CSV 时,如何隐藏 pandas to_datetime NaT?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39028197/

相关文章:

python - 使用 Pandas groupby 如何使用加法聚合一列列表?

python - 如何使用 Python/pandas 获取带有行摘要的分钟/小时财务数据?

python - 允许使用 Google App Engine 和 Jinja2 的 <br> 标签

python - 更新 Pandas DataFrame 上行子集的列值的有效方法?

javascript - 如何用时钟重复一个javascript函数

php - 如何在数据库中保持日期列为空

python - 如何使用 Keras.to_Categorical 在 dataFrame 中一次对多列进行 One-Hot 编码?

python - findall() 返回一个列表,但它不会添加列表中的元素

java - Julian Day 与 Date 对象获取当前日期

Python - 将数据框中的所有项目转换为字符串