python - DataFrame 挑战 : mapping ID to value in different row. 最好使用 Polars

考虑这个例子:

import polars as pl

df = pl.DataFrame({
    'ID': ['0', '1', '2', '3', '4', '5','6', '7', '8', '9', '10'],
    'Name' : ['A','','','','B','','C','','','D', ''], 
    'Element' : ['', '4', '4', '0', '', '4', '', '0', '9', '', '6']
})

“名称”链接到“ID”。此 ID 用作“元素”列中的值。如何将正确的“名称”映射到元素？另外，我想按“名称”(“Name_list”)对元素进行分组，对它们进行计数并按计数值(“E_count”)进行排序。

生成的 df 为:

Name_list Element E_count
-------------------------
'B'       '4'     3
'A'       '0'     2
'C'       '6'     1
'D'       '9'     1

非常感谢您的反馈；甚至是 Pandas 解决方案。

最佳答案

这是一个 Polars 解决方案。我们将使用join链接 ID 和 Element 列(经过一些过滤和汇总后)。

import polars as pl
(
    df.select(["Name", "ID"])
    .filter(pl.col("Name") != "")
    .join(
        df.groupby("Element").agg(pl.count().alias("E_count")),
        left_on="ID",
        right_on="Element",
        how="left",
    )
    .sort('E_count', reverse=True)
    .rename({"Name":"Name_list", "ID":"Element"})
)

注意:这与您的答案中列出的解决方案不同。名称 D 与 ID 9(而不是 10)相关联。

shape: (4, 3)
┌───────────┬─────────┬─────────┐
│ Name_list ┆ Element ┆ E_count │
│ ---       ┆ ---     ┆ ---     │
│ str       ┆ str     ┆ u32     │
╞═══════════╪═════════╪═════════╡
│ B         ┆ 4       ┆ 3       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ A         ┆ 0       ┆ 2       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ C         ┆ 6       ┆ 1       │
├╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ D         ┆ 9       ┆ 1       │
└───────────┴─────────┴─────────┘

您还可以使用polars.Series.value_counts方法，看起来比较干净:

import polars as pl
(
    df.select(["Name", "ID"])
    .filter(pl.col("Name") != "")
    .join(
        df.get_column("Element").value_counts(),
        left_on="ID",
        right_on="Element",
        how="left",
    )
    .sort("counts", reverse=True)
    .rename({"Name": "Name_list", "ID": "Element", "counts": "E_count"})
)

关于python - DataFrame 挑战 : mapping ID to value in different row. 最好使用 Polars，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72234348/

python - DataFrame 挑战 : mapping ID to value in different row. 最好使用 Polars

上一篇：flutter - 有没有办法从Ropsten Testnet获取usdt？

下一篇：R st_join 返回具有点属性的多边形