python - Pandas 融化了 : Columns to Rows

标签 python pandas dataframe

这是我自学Python三天后的第一个问题,所以请宽容。

我连接了四个数据帧:

frames = [dfLocationID, dfDimensions, dfCategories, dfTags]  
result = pd.concat(frames,
                   ignore_index=True,
                   sort=False)

要得到这个:

        Location ID   Dimensions     Categories              Tags
    0        1000.0          NaN            NaN               NaN
    1           NaN  3,000 sq ft            NaN               NaN
    2           NaN          NaN  * In the Zone               NaN
    3           NaN          NaN      Apartment               NaN
    4           NaN          NaN           Loft               NaN
    5           NaN          NaN            NaN          Bohemian
    6           NaN          NaN            NaN          Colorful
    7           NaN          NaN            NaN   Eclectic Quirky
    8           NaN          NaN            NaN           Kitchen
    9           NaN          NaN            NaN       Living Room
    10          NaN          NaN            NaN             Piano
    11          NaN          NaN            NaN        Wood Floor

我想实现这个:

        Location ID   Dimensions        Item              Data
    0        1000.0  3,000 sq ft  Categories     * In the Zone
    1        1000.0  3,000 sq ft  Categories         Apartment
    2        1000.0  3,000 sq ft  Categories              Loft
    3        1000.0  3,000 sq ft        Tags          Bohemian
    4        1000.0  3,000 sq ft        Tags          Colorful
    5        1000.0  3,000 sq ft        Tags   Eclectic Quirky
    6        1000.0  3,000 sq ft        Tags           Kitchen
    7        1000.0  3,000 sq ft        Tags       Living Room
    8        1000.0  3,000 sq ft        Tags             Piano
    9        1000.0  3,000 sq ft        Tags        Wood Floor

然后我尝试了这个:

dfTemp = ((dfLocationID.join(dfDimensions, how='outer')).join(dfCategories, how='outer')).join(dfTags, how='outer')

要得到这个:

       Location ID   Dimensions     Categories              Tags
    0       1000.0  3,000 sq ft  * In the Zone          Bohemian
    1          NaN          NaN      Apartment          Colorful
    2          NaN          NaN           Loft   Eclectic Quirky
    3          NaN          NaN            NaN           Kitchen
    4          NaN          NaN            NaN       Living Room
    5          NaN          NaN            NaN             Piano
    6          NaN          NaN            NaN        Wood Floor

现在我尝试将最后两列转换为行:

dfFinal = dfTemp.melt(id_vars=["Location ID", "Dimensions"],
                          var_name="Item",
                          value_name="Data")

但是我明白了:

        Location ID   Dimensions        Item              Data
    0        1000.0  3,000 sq ft  Categories     * In the Zone
    1           NaN          NaN  Categories         Apartment
    2           NaN          NaN  Categories              Loft
    3           NaN          NaN  Categories               NaN
    4           NaN          NaN  Categories               NaN
    5           NaN          NaN  Categories               NaN
    6           NaN          NaN  Categories               NaN
    7        1000.0  3,000 sq ft        Tags          Bohemian
    8           NaN          NaN        Tags          Colorful
    9           NaN          NaN        Tags   Eclectic Quirky
    10          NaN          NaN        Tags           Kitchen
    11          NaN          NaN        Tags       Living Room
    12          NaN          NaN        Tags             Piano
    13          NaN          NaN        Tags        Wood Floor

关于如何清理数据有什么想法吗?此外,我必须迭代不同的位置 ID,并且类别和标签中的值数量不会恒定。

谢谢。

最佳答案

首先,我会将 NaN 转换为 None,因为它们更容易处理:

df = df.where((pd.notnull(df)), None)

然后,您希望整个第一列和第二列具有相同的值(我不知道您在哪里做出这个假设):

df['Location ID'] = df['Location ID'].iloc[0]
df['Dimensions'] = df['Dimensions'].iloc[1]

然后您就可以按原样运行熔化函数。现在,您只需过滤掉“Item”或“Data”列中为 None 的所有行:

df = df[~(df["Item"].isnull() | df["Data"].isnull())]

然后,输出就是你想要的:

Location ID Dimensions  Item    Data
2   1000.0  3000 sq ft  Categories  * In the Zone
3   1000.0  3000 sq ft  Categories  Apartment
4   1000.0  3000 sq ft  Categories  Loft
17  1000.0  3000 sq ft  Tags    Bohemian
18  1000.0  3000 sq ft  Tags    Colorful
19  1000.0  3000 sq ft  Tags    Eclectic Quirky
20  1000.0  3000 sq ft  Tags    Kitchen
21  1000.0  3000 sq ft  Tags    Living Room
22  1000.0  3000 sq ft  Tags    Piano
23  1000.0  3000 sq ft  Tags    Wood Floor

如果您需要对不同位置执行此操作,请将此过程打包到函数 transform 中并使用 groupby:

df_new = pd.DataFrame(columns = df.columns)
for name, group in df.groupby(['Location ID', 'Dimensions']):
    df_group = transform(group)
    pd.concat([df_new, df_group], axis=0)

关于python - Pandas 融化了 : Columns to Rows,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58141356/

相关文章:

python - 选择文本文件中的特定行和单元格并放入数据框 : python or R

python - 如何使用 pandas 获取数据框中具有特定值的列数?

python-3.x - pandas to_numeric 无法将字符串值转换为整数

python - Pandas 选择 DataFrame 中的某些列和最后 n 列

python - 如何为某些 Django Rest Framework View 启用@cache_page?

python - 如何在不生成SettingWithCopyWarning的情况下将列插入到DataFrame中

python - Flask - 如何将 request.files ['image' ] 读取为 base64?

python - 根据分组值将列添加到 DataFrame

python - 我该如何着手定制夹层墨盒商店/产品?

python - 从 C 扩展代码中释放 python 列表