python - 考虑 Pandas 中的几个属性，删除重复项

标签 python pandas machine-learning dataframe

我想排除那些具有相同标题和同一年份的实例。

     title      votes  ranking  year
0    Wonderland  19      7.9    1931
1    Wonderland  120     7.1    1997
2    Wonderland  3524    7.2    1999
3    Wonderland  18169   6.6    2003
4    Wonderland  17      8.7    2010
5    Wonderland  6       8.5    2012
6    Wonderland  8       7.4    2012

例如，在本例中。我只会删除 5 或 6

最佳答案

您可以使用drop_duplicates()与 subset= 参数。如果您的数据框名为 df，您需要执行以下操作:

In [13]: df.drop_duplicates(subset=['title', 'year'])

将返回:

Out[13]:
        title  votes  ranking  year
0  Wonderland     19      7.9  1931
1  Wonderland    120      7.1  1997
2  Wonderland   3524      7.2  1999
3  Wonderland  18169      6.6  2003
4  Wonderland     17      8.7  2010
5  Wonderland      6      8.5  2012

请注意，您会丢失索引 6 中包含的投票和排名中的任何独特信息。

关于python - 考虑 Pandas 中的几个属性，删除重复项，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32342692/

上一篇：python - 我收到此 ValueError : total size of new array must be unchanged error. 任何人都可以解决吗？

下一篇：python - 无法训练线性 SVM 机器

相关文章：

python - Networkx Python 中的权重相乘

python - 将 Pandas 数据框切片到包含值的所有列的第一个实例

machine-learning - 如何使用排列特征重要性获取值

python - python django 中带有 OR 条件的 LEFT JOIN

Python:读取文件并计算总和和平均值

python - 用数字替换字符串 numpy 数组

r - 在 MNIST 数字识别数据集上表现不佳

python - 创建新的 Conda 环境时出现 PackageNotFoundError

python - 如何在 Python 的一行用法中合并 f 字符串和 b 字符串

python - 使用 python ruamel-yaml 保持 YAML 文件的偏移缩进