这是我自学Python三天后的第一个问题,所以请宽容。
我连接了四个数据帧:
frames = [dfLocationID, dfDimensions, dfCategories, dfTags]
result = pd.concat(frames,
ignore_index=True,
sort=False)
要得到这个:
Location ID Dimensions Categories Tags
0 1000.0 NaN NaN NaN
1 NaN 3,000 sq ft NaN NaN
2 NaN NaN * In the Zone NaN
3 NaN NaN Apartment NaN
4 NaN NaN Loft NaN
5 NaN NaN NaN Bohemian
6 NaN NaN NaN Colorful
7 NaN NaN NaN Eclectic Quirky
8 NaN NaN NaN Kitchen
9 NaN NaN NaN Living Room
10 NaN NaN NaN Piano
11 NaN NaN NaN Wood Floor
我想实现这个:
Location ID Dimensions Item Data
0 1000.0 3,000 sq ft Categories * In the Zone
1 1000.0 3,000 sq ft Categories Apartment
2 1000.0 3,000 sq ft Categories Loft
3 1000.0 3,000 sq ft Tags Bohemian
4 1000.0 3,000 sq ft Tags Colorful
5 1000.0 3,000 sq ft Tags Eclectic Quirky
6 1000.0 3,000 sq ft Tags Kitchen
7 1000.0 3,000 sq ft Tags Living Room
8 1000.0 3,000 sq ft Tags Piano
9 1000.0 3,000 sq ft Tags Wood Floor
然后我尝试了这个:
dfTemp = ((dfLocationID.join(dfDimensions, how='outer')).join(dfCategories, how='outer')).join(dfTags, how='outer')
要得到这个:
Location ID Dimensions Categories Tags
0 1000.0 3,000 sq ft * In the Zone Bohemian
1 NaN NaN Apartment Colorful
2 NaN NaN Loft Eclectic Quirky
3 NaN NaN NaN Kitchen
4 NaN NaN NaN Living Room
5 NaN NaN NaN Piano
6 NaN NaN NaN Wood Floor
现在我尝试将最后两列转换为行:
dfFinal = dfTemp.melt(id_vars=["Location ID", "Dimensions"],
var_name="Item",
value_name="Data")
但是我明白了:
Location ID Dimensions Item Data
0 1000.0 3,000 sq ft Categories * In the Zone
1 NaN NaN Categories Apartment
2 NaN NaN Categories Loft
3 NaN NaN Categories NaN
4 NaN NaN Categories NaN
5 NaN NaN Categories NaN
6 NaN NaN Categories NaN
7 1000.0 3,000 sq ft Tags Bohemian
8 NaN NaN Tags Colorful
9 NaN NaN Tags Eclectic Quirky
10 NaN NaN Tags Kitchen
11 NaN NaN Tags Living Room
12 NaN NaN Tags Piano
13 NaN NaN Tags Wood Floor
关于如何清理数据有什么想法吗?此外,我必须迭代不同的位置 ID,并且类别和标签中的值数量不会恒定。
谢谢。
最佳答案
首先,我会将 NaN
转换为 None
,因为它们更容易处理:
df = df.where((pd.notnull(df)), None)
然后,您希望整个第一列和第二列具有相同的值(我不知道您在哪里做出这个假设):
df['Location ID'] = df['Location ID'].iloc[0]
df['Dimensions'] = df['Dimensions'].iloc[1]
然后您就可以按原样运行熔化函数。现在,您只需过滤掉“Item”或“Data”列中为 None
的所有行:
df = df[~(df["Item"].isnull() | df["Data"].isnull())]
然后,输出就是你想要的:
Location ID Dimensions Item Data
2 1000.0 3000 sq ft Categories * In the Zone
3 1000.0 3000 sq ft Categories Apartment
4 1000.0 3000 sq ft Categories Loft
17 1000.0 3000 sq ft Tags Bohemian
18 1000.0 3000 sq ft Tags Colorful
19 1000.0 3000 sq ft Tags Eclectic Quirky
20 1000.0 3000 sq ft Tags Kitchen
21 1000.0 3000 sq ft Tags Living Room
22 1000.0 3000 sq ft Tags Piano
23 1000.0 3000 sq ft Tags Wood Floor
如果您需要对不同位置执行此操作,请将此过程打包到函数 transform
中并使用 groupby
:
df_new = pd.DataFrame(columns = df.columns)
for name, group in df.groupby(['Location ID', 'Dimensions']):
df_group = transform(group)
pd.concat([df_new, df_group], axis=0)
关于python - Pandas 融化了 : Columns to Rows,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58141356/