我有一个数据集,不幸的是它有零星的 DateTime 值,而不是 int
或 str
。
例如,我如何通过迭代数据库并将 2019-05-03 00:00:00
替换为 5-3 来编辑值?
我尝试了一些for循环,但没有成功。 Pandas 有捷径吗?
,age,menopause,tumor-size,inv-nodes,node-caps,deg-malig,breast,breast-quad,irradiat,Class
0,40-49,premeno,15-19,0-2,yes,3,right,left_up,no,recurrence-events
1,50-59,ge40,15-19,0-2,no,1,right,central,no,no-recurrence-events
2,50-59,ge40,35-39,0-2,no,2,left,left_low,no,recurrence-events
3,40-49,premeno,35-39,0-2,yes,3,right,left_low,yes,no-recurrence-events
4,40-49,premeno,30-34,2019-05-03 00:00:00,yes,2,left,right_up,no,recurrence-events
5,50-59,premeno,25-29,2019-05-03 00:00:00,no,2,right,left_up,yes,no-recurrence-events
6,50-59,ge40,40-44,0-2,no,3,left,left_up,no,no-recurrence-events
7,40-49,premeno,2014-10-01 00:00:00,0-2,no,2,left,left_up,no,no-recurrence-events
8,40-49,premeno,0-4,0-2,no,2,right,right_low,no,no-recurrence-events
9,40-49,ge40,40-44,15-17,yes,2,right,left_up,yes,no-recurrence-events
10,50-59,premeno,25-29,0-2,no,2,left,left_low,no,no-recurrence-events
11,60-69,ge40,15-19,0-2,no,2,right,left_up,no,no-recurrence-events
12,50-59,ge40,30-34,0-2,no,1,right,central,no,no-recurrence-events
13,50-59,ge40,25-29,0-2,no,2,right,left_up,no,no-recurrence-events
14,40-49,premeno,25-29,0-2,no,2,left,left_low,yes,recurrence-events
15,30-39,premeno,20-24,0-2,no,3,left,central,no,no-recurrence-events
16,50-59,premeno,2014-10-01 00:00:00,2019-05-03 00:00:00,no,1,right,left_up,no,no-recurrence-events
17,60-69,ge40,15-19,0-2,no,2,right,left_up,no,no-recurrence-events
18,50-59,premeno,40-44,0-2,no,2,left,left_up,no,no-recurrence-events
19,50-59,ge40,20-24,0-2,no,3,left,left_up,no,no-recurrence-events
20,50-59,lt40,20-24,0-2,?,1,left,left_low,no,recurrence-events
21,60-69,ge40,40-44,2019-05-03 00:00:00,no,2,right,left_up,yes,no-recurrence-events
22,50-59,ge40,15-19,0-2,no,2,right,left_low,no,no-recurrence-events
23,40-49,premeno,2014-10-01 00:00:00,0-2,no,1,right,left_up,no,no-recurrence-events
24,30-39,premeno,15-19,2019-08-06 00:00:00,yes,3,left,left_low,yes,recurrence-events
25,50-59,ge40,20-24,2019-05-03 00:00:00,yes,2,right,left_up,no,no-recurrence-events
最佳答案
您可以使用自定义函数,该函数使用regex
来查找日期时间字符串并将其替换为非零填充的“%m-%d”(在Linux上,您也可以 strftime
与 '%-m-%-d'...):
import re
def to_month_day(s):
m = re.match("\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", s)
if m:
return m[0][5:7].lstrip('0') + '-' + m[0][8:10].lstrip('0')
return s
# e.g.
df['inv-nodes'].apply(to_month_day)
# 0 0-2
# 1 0-2
# 2 0-2
# 3 0-2
# 4 5-3
# 5 5-3
# 6 0-2
# 7 0-2
# 8 0-2
# 9 15-17
# 10 0-2
# 11 0-2
# 12 0-2
# 13 0-2
# 14 0-2
# 15 0-2
# 16 5-3
# 17 0-2
# 18 0-2
# 19 0-2
# 20 0-2
# 21 5-3
# 22 0-2
# 23 0-2
# 24 8-6
# 25 5-3
关于python - 如何将日期时间值更改为单独格式化的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62087931/