我正在尝试使用内部连接来连接两个 pandas 数据框。
my_df = pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['myDate'])
但是我收到以下错误:
KeyError: 'myDate' TypeError: an integer is required
我相信按日期加入是有效的,但是我无法使这个简单的加入起作用?
DF2 是使用以下内容创建的
df2 = idf.groupby(lambda x: (x.year,x.month,x.day)).mean()
有人可以建议一下吗?非常感谢。
df1
type object
id object
date object
value float64
type id date value
0 CAR PSTAT001 15/07/15 42
1 BIKE PSTAT001 16/07/15 42
2 BIKE PSTAT001 17/07/15 42
3 BIKE PSTAT004 18/07/15 42
4 BIKE PSTAT001 19/07/15 32
df2
myDate object
val1 float64
val2 float64
val3 float64
myDate val1 val2 val3
0 (2015,7,13) 1074 1871.666667 2800.777778
1 (2015,7,14) 347.958333 809.416667 1308.458333
2 (2015,7,15) 202.625 597.375 1008.666667
3 (2015,7,16) 494.958333 1192 1886.916667
DF1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3040 entries, 0 to 3039
Data columns (total 4 columns):
type 3040 non-null object
id 3040 non-null object
date 3040 non-null object
value 3040 non-null float64
dtypes: float64(1), object(3)
memory usage: 118.8+ KB
DF2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 16 entries, 0 to 15
Data columns (total 4 columns):
myDate 16 non-null object
val1 16 non-null float64
val2 16 non-null float64
val3 16 non-null float64
dtypes: float64(3), object(1)
memory usage: 640.0+ bytes
最佳答案
您的日期列不是日期时间数据类型,df1
看起来像 str
而另一个是 tuple
因此您需要先转换它们然后合并就会起作用:
In [75]:
df1['date'] = pd.to_datetime(df1['date'])
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 4 columns):
type 5 non-null object
id 5 non-null object
date 5 non-null datetime64[ns]
value 5 non-null int64
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 200.0+ bytes
In [76]:
import datetime as dt
df2['myDate'] = df2['myDate'].apply(lambda x: dt.datetime(x[0], x[1], x[2]))
df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 4 columns):
myDate 4 non-null datetime64[ns]
val1 4 non-null float64
val2 4 non-null float64
val3 4 non-null float64
dtypes: datetime64[ns](1), float64(3)
memory usage: 160.0 bytes
In [78]:
my_df= pd.merge(df1, df2, how = 'inner', left_on = ['date'], right_on = ['myDate'])
my_df
Out[78]:
type id date value myDate val1 val2 \
0 CAR PSTAT001 2015-07-15 42 2015-07-15 202.625000 597.375
1 BIKE PSTAT001 2015-07-16 42 2015-07-16 494.958333 1192.000
val3
0 1008.666667
1 1886.916667
关于python - 如何在Python中合并两个pandas DataFrame?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31384177/