我有一个包含 3 列和 1,000 多行的 DataFrame,
df
day product order
2010-01-01 150ml Mask 9
2010-01-02 230ml Lotion 27
2010-01-03 600ml Shampoo 33
我想按如下方式对每个产品进行子集化,
df_mask df_lotion df_shampoo
day order day order day order
2010-01-01 9 2010-01-02 27 2010-01-03 33
2010-01-09 8 2010-01-05 30 2010-01-04 25
2010-01-11 13 2010-01-06 29 2010-01-06 46
我就是这样做的
# Create a product list
productName = df['product'].tolist()
# Subsetting
def subtable(df,productName):
return (df[(df['product'] == productName)])
# Subsetting
df_mask = subtable(df, '150ml Mask')
df_lotion = subtable(df, '230ml Lotion')
df_shampoo = subtable(df, '230ml Shampoo')
有什么方法可以使用 for 循环一次性获取所有子集,因为数据框有许多不同的产品。
最佳答案
您可以使用groupby
为此目的,它完全满足您的需要:
# show example data
print(df)
day product order
0 2010-01-01 "150ml Mask" 9
1 2010-01-02 "230ml Lotion" 27
2 2010-01-03 "600ml Shampoo" 33
3 2010-01-04 "250ml Mask" 12
4 2010-01-05 "330ml Lotion" 24
5 2010-01-06 "400ml Shampoo" 13
# split product column and keep only product name
df["product"] = df["product"].str.split(expand=True)[1]
# groupby product
products = df.groupby("product")
# print product and corresponding product df
for product, product_df in products:
print(product)
print(product_df)
Lotion
day product order
1 2010-01-02 Lotion 27
4 2010-01-05 Lotion 24
Mask
day product order
0 2010-01-01 Mask 9
3 2010-01-04 Mask 12
Shampoo
day product order
2 2010-01-03 Shampoo 33
5 2010-01-06 Shampoo 13
为了单独访问每个子组,您可以使用与您的subtable
函数相对应的get_group
:
mask_df = products.get_group("Mask")
print(mask_df)
day product order
0 2010-01-01 Mask 9
3 2010-01-04 Mask 12
最后,要获取一个字典中的所有子数据帧,您可以循环遍历 products
并删除产品列本身:
df_dict = {product: product_df.drop("product", axis=1)
for product, product_df in products}
print(df_dict["Mask"])
day order
0 2010-01-01 9
3 2010-01-04 12
关于python - 如何在 Python 中使用 for 循环对 DataFrame 进行子集化和列出?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42712404/