我已经获取了一个数据框,每个引用包含多行(因为分数),并获取了这些分数并创建了一个新的数据框,每个引用一行,分数在一系列中。这一切都正常。
但是,当我尝试将系列数据帧添加回旧数据帧时,它正在工作,但每个单元格都显示“nan”。
import pandas as pd
df = pd.read_csv('GAPData.csv', encoding='cp1252')
df1 = df[['GrantRefNumber','Academic_Reviews']].copy()
dfGroup1 = df1.groupby('GrantRefNumber').apply(lambda x: list(x.Academic_Reviews))
dfGroup1 产生这个...
GrantRefNumber
D/G001118 [5, 6, 6]
D/P041236 [5, 2, 6]
D/P753396 [2, 2, 6, 5]
D/P043434 [2, 5, 4, 5]
D/P034285 [4, 4, 6, 3]
然后我运行:
df['Academic Reviews'] = dfGroup1.groupby(
'GrantRefNumber').apply(lambda x: list(x.Academic_Reviews))
df.drop_duplicates('GrantRefNumber', inplace=True)
所有这些都可以编译,但我在新列“学术评论”下的每个单元格中都留下了“nan”,而不是该系列。
对我做错了什么有什么建议吗?
最佳答案
您的索引存在未对齐的问题,因此请获取列中的所有 NaN
。
我认为你需要set_index
原始 df
然后 join
:
df = df.set_index('GrantRefNumber')
df = df.join(dfGroup1.rename('Academic Reviews')).reset_index()
示例:
df = pd.DataFrame({'GrantRefNumber':['D/G001118'] * 3 + ['D/P041236'] * 3 + ['D/P753396'] * 4,
'Academic_Reviews':[5, 6, 6, 5, 2, 6, 2, 2, 6, 5],
'another_data':[7, 8, 1, 8, 2, 8, 2, 0, 1, 5]})
<小时/>
print (df)
Academic_Reviews GrantRefNumber another_data
0 5 D/G001118 7
1 6 D/G001118 8
2 6 D/G001118 1
3 5 D/P041236 8
4 2 D/P041236 2
5 6 D/P041236 8
6 2 D/P753396 2
7 2 D/P753396 0
8 6 D/P753396 1
9 5 D/P753396 5
dfGroup1 = df.groupby('GrantRefNumber')['Academic_Reviews'].apply(list)
print (dfGroup1)
GrantRefNumber
D/G001118 [5, 6, 6]
D/P041236 [5, 2, 6]
D/P753396 [2, 2, 6, 5]
Name: Academic_Reviews, dtype: object
df = df.set_index('GrantRefNumber')
df = df.join(dfGroup1.rename('Academic_Reviews_list')).reset_index()
print (df)
GrantRefNumber Academic_Reviews another_data Academic_Reviews_list
0 D/G001118 5 7 [5, 6, 6]
1 D/G001118 6 8 [5, 6, 6]
2 D/G001118 6 1 [5, 6, 6]
3 D/P041236 5 8 [5, 2, 6]
4 D/P041236 2 2 [5, 2, 6]
5 D/P041236 6 8 [5, 2, 6]
6 D/P753396 2 2 [2, 2, 6, 5]
7 D/P753396 2 0 [2, 2, 6, 5]
8 D/P753396 6 1 [2, 2, 6, 5]
9 D/P753396 5 5 [2, 2, 6, 5]
关于Python Pandas - 将带有 'series' 的 Dataframe 连接到另一个 Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44072156/