python - Pandas 从列表列中获取唯一值

如何获取 pandas 或 numpy 中列表列的唯一值，以便第二列来自

会导致'action', 'crime', 'drama'。

我能想到的最接近(但不起作用)的解决方案是:

 genres = data['Genre'].unique()

但这可以预见地导致 TypeError 说明列表是如何不可哈希的。

TypeError: unhashable 类型: 'list'

Set 似乎是个好主意但是

genres = data.apply(set(), columns=['Genre'], axis=1)

但也会导致 TypeError: set() 没有关键字参数

最佳答案

你可以使用explode:

data = pd.DataFrame([
    {
        "title": "The Godfather: Part II",
        "genres": ["crime", "drama"],
        "director": "Fracis Ford Coppola"
    },
    {
        "title": "The Dark Knight",
        "genres": ["action", "crime", "drama"],
        "director": "Christopher Nolan"
    }
])
# Changed from data.explode("genres")["genres"].unique() as suggested by rafaelc
data["genres"].explode().unique()

结果:

array(['crime', 'drama', 'action'], dtype=object)

关于python - Pandas 从列表列中获取唯一值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58528989/

上一篇：python - 无法阻止标签扩展主窗口

下一篇：python - 计算Python列表中的连续数字

相关文章：

python - 在 Python 中随机化字典时出现类型错误

python - 我需要更改当前数据框的格式。我应该怎么做？

python - 如何按列导出数据框以分隔 csv 文件？以及如何将不同数据框中的列附加到分离的 csv 文件中

python - 如何将 numpy 数组分成 pandas 中的单独列

python - 是否有与 Python 的 Counter 集合等效的 F#？

python - ReportLab:大字体的文本挤在段落中

python - Pandas:GroupBy 和 Order Groups 基于每组中的最大值

python - 取决于行号的函数

python - 预期为二维数组，得到一维数组

python - 有效地对 numpy 矩阵的行进行排序