python - 将索引值设置为 date.time pandas 时遇到问题

标签 python pandas

我已将数据框索引到日期列。现在我想将索引设置为_datetime。我的代码如下:

import numpy as np

import pandas as pd

import glob

​

df = pd.concat((pd.read_csv(f, sep='|', header=None, index_col=None, low_memory=False) for f in glob.glob('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas2**.txt')))



df.columns = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', '21', '22']                

df.set_index(pd.to_datetime(df['date']), inplace=True)

​



df1 = df[['1', '6', '7', '10', '12', '13', '15', '16', '17']].copy() 

df1.columns = ['cmte_id', 'trans_typ', 'entity_typ', 'state', 'employer', 'occupation', 'amount', 'fec_id', 'cand_id']

Print (df1)    ​

但我的输出看起来像是附加了一个新的日期列。

                   cmte_id trans_typ entity_typ state employer  \
date                                                                           
1970-01-01 00:00:00.008152007  C00112250       24K        ORG    DC      NaN   
1970-01-01 00:00:00.009262007  C00119040       24K        CCM    FL      NaN   
1970-01-01 00:00:00.009262007  C00119040       24K        CCM    MD      NaN   
1970-01-01 00:00:00.00

我的原始日期列是日期索引中的最后 8 位数字。此外,read.csv 文件的前几行如下(第一行中的日期值为 08152007):

C00112250|N|Q3|G|27931381854|24K|ORG|HILLARY CLINTON FOR PRESIDENT EXP. COMM.|WASHINGTON|DC|20013|||08152007|2000|C00431569|P00003392|71006.E7975|307490|||4101720071081637544

最佳答案

好的,我看到您的问题,将您的 read_csv 行更改为:

df = pd.concat((pd.read_csv(f, sep='|', header=None, names=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', '21', '22'], index_col=None, dtype={'date':str}) for f in glob.glob('/home/jayaramdas/anaconda3/Thesis/FEC_data/itpas2_data/itpas2**.txt')))

这会设置您的列名称并强制将日期列视为 str 数据类型,然后再将其视为 int,因此删除了前导 0,然后您可以转换类型:

df.set_index(pd.to_datetime(df['date'], format='%m%d%Y), inplace=True)

示例:

In [336]:
import pandas as pd
import io
t="""C00112250|N|Q3|G|27931381854|24K|ORG|HILLARY CLINTON FOR PRESIDENT EXP. COMM.|WASHINGTON|DC|20013|||08152007|2000|C00431569|P00003392|71006.E7975|307490|||4101720071081637544"""
df = pd.read_csv(io.StringIO(t), sep='|', header=None, names=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', 'date', '15', '16', '17', '18', '19', '20', '21', '22'], index_col=None, dtype={'date':str})
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 22 columns):
1       1 non-null object
2       1 non-null object
3       1 non-null object
4       1 non-null object
5       1 non-null int64
6       1 non-null object
7       1 non-null object
8       1 non-null object
9       1 non-null object
10      1 non-null object
11      1 non-null int64
12      0 non-null float64
13      0 non-null float64
date    1 non-null object
15      1 non-null int64
16      1 non-null object
17      1 non-null object
18      1 non-null object
19      1 non-null int64
20      0 non-null float64
21      0 non-null float64
22      1 non-null int64
dtypes: float64(4), int64(5), object(13)
memory usage: 184.0+ bytes

In [337]:    
df['date'] = pd.to_datetime(df['date'], format='%m%d%Y')
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1 entries, 0 to 0
Data columns (total 22 columns):
1       1 non-null object
2       1 non-null object
3       1 non-null object
4       1 non-null object
5       1 non-null int64
6       1 non-null object
7       1 non-null object
8       1 non-null object
9       1 non-null object
10      1 non-null object
11      1 non-null int64
12      0 non-null float64
13      0 non-null float64
date    1 non-null datetime64[ns]
15      1 non-null int64
16      1 non-null object
17      1 non-null object
18      1 non-null object
19      1 non-null int64
20      0 non-null float64
21      0 non-null float64
22      1 non-null int64
dtypes: datetime64[ns](1), float64(4), int64(5), object(12)
memory usage: 184.0+ bytes

In [338]:
df['date']

Out[338]:
0   2007-08-15
Name: date, dtype: datetime64[ns]

关于python - 将索引值设置为 date.time pandas 时遇到问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35911095/

相关文章:

python - 迭代python中的两个字典

python - 在pandas python中将指数或科学数转换为整数

python - With 语句将变量设置为 None

java - 为Nifi编写流媒体服务时如何解决Java的空指针异常?

python - Pandas Dataframe 到具有多个键的字典

python - 使用 pandas 包用 python 清理 excel 数据

python - "do this in one pass"是什么意思?

python - 将数据框列中的值替换为合适的值

python - 将 qcut 应用于滚动分析

python - 将多列转换为一行 (Pandas/Numpy)