Python Pandas - 将带有 'series' 的 Dataframe 连接到另一个 Dataframe

标签 python pandas

我已经获取了一个数据框,每个引用包含多行(因为分数),并获取了这些分数并创建了一个新的数据框,每个引用一行,分数在一系列中。这一切都正常。

但是,当我尝试将系列数据帧添加回旧数据帧时,它正在工作,但每个单元格都显示“nan”。

import pandas as pd

df = pd.read_csv('GAPData.csv', encoding='cp1252')
df1 = df[['GrantRefNumber','Academic_Reviews']].copy()
dfGroup1 = df1.groupby('GrantRefNumber').apply(lambda x: list(x.Academic_Reviews))

dfGroup1 产生这个...

GrantRefNumber
D/G001118       [5, 6, 6]
D/P041236       [5, 2, 6]
D/P753396    [2, 2, 6, 5]
D/P043434    [2, 5, 4, 5]
D/P034285    [4, 4, 6, 3]

然后我运行:

df['Academic Reviews'] = dfGroup1.groupby(
    'GrantRefNumber').apply(lambda x: list(x.Academic_Reviews))
df.drop_duplicates('GrantRefNumber', inplace=True)

所有这些都可以编译,但我在新列“学术评论”下的每个单元格中都留下了“nan”,而不是该系列。

对我做错了什么有什么建议吗?

最佳答案

您的索引存在未对齐的问题,因此请获取列中的所有 NaN

我认为你需要set_index原始 df 然后 join :

df = df.set_index('GrantRefNumber')
df = df.join(dfGroup1.rename('Academic Reviews')).reset_index()

示例:

df = pd.DataFrame({'GrantRefNumber':['D/G001118'] * 3 + ['D/P041236'] * 3 + ['D/P753396'] * 4,
                   'Academic_Reviews':[5, 6, 6, 5, 2, 6, 2, 2, 6, 5],
                   'another_data':[7, 8, 1, 8, 2, 8, 2, 0, 1, 5]})
<小时/>
print (df)
   Academic_Reviews GrantRefNumber  another_data
0                 5      D/G001118             7
1                 6      D/G001118             8
2                 6      D/G001118             1
3                 5      D/P041236             8
4                 2      D/P041236             2
5                 6      D/P041236             8
6                 2      D/P753396             2
7                 2      D/P753396             0
8                 6      D/P753396             1
9                 5      D/P753396             5

dfGroup1 = df.groupby('GrantRefNumber')['Academic_Reviews'].apply(list)
print (dfGroup1)
GrantRefNumber
D/G001118       [5, 6, 6]
D/P041236       [5, 2, 6]
D/P753396    [2, 2, 6, 5]
Name: Academic_Reviews, dtype: object

df = df.set_index('GrantRefNumber')
df = df.join(dfGroup1.rename('Academic_Reviews_list')).reset_index()
print (df)
  GrantRefNumber  Academic_Reviews  another_data Academic_Reviews_list
0      D/G001118                 5             7             [5, 6, 6]
1      D/G001118                 6             8             [5, 6, 6]
2      D/G001118                 6             1             [5, 6, 6]
3      D/P041236                 5             8             [5, 2, 6]
4      D/P041236                 2             2             [5, 2, 6]
5      D/P041236                 6             8             [5, 2, 6]
6      D/P753396                 2             2          [2, 2, 6, 5]
7      D/P753396                 2             0          [2, 2, 6, 5]
8      D/P753396                 6             1          [2, 2, 6, 5]
9      D/P753396                 5             5          [2, 2, 6, 5]

关于Python Pandas - 将带有 'series' 的 Dataframe 连接到另一个 Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44072156/

相关文章:

python - 属性错误 : module 'html5lib.treebuilders.etree' has no attribute 'getETreeModule'

python - 需要在 Flask 中编写 View 函数的代码方面的帮助 - Python Web 框架

python - key 错误 : 0 Pandas

python - 从 Excel 多表文件 : List comprehension between columns 解析

python - 将 json 记录数组规范化为数据框

python - 如何在 UI 中显示 Django ValidationError

python - pandas如何检查每组中列值之间的差异是否在范围内

python - 测试功能的正确方法

python - 数据框的 Pandas 合并

python - 根据条件创建列