python-3.x - 如何获取 Pandas Dataframe 中列列表中非重复元素的计数？

我翻阅了许多 SO 帖子，只是为了找到一个与我的情况匹配的 Pandas 解决方案，但我找不到。

我遇到的问题是，我有 Dataframe ，如下所示:

$ df
  email               hashes  
0 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="047177617644617c65697468612a676b69" rel="noreferrer noopener nofollow">[email protected]</a>    (iz3s65inn942j1bmedv., iz3s65inn942j1bmedv., 10$0mw1ewlhqlm0l)

在我的情况下，nunique() 和 drop_duplicates() 不起作用，因为我需要获取元组本身中非重复元素的计数。在上述情况下，结果将是:

$ df
  email               hashes
0 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="493c3a2c3b092c31282439252c672a2624" rel="noreferrer noopener nofollow">[email protected]</a>    1

如何实现此结果并获取哈希列中元组的非重复元素的计数？

最佳答案

将自定义 lambda 函数与 Counter 结合使用，仅计算唯一值:

from collections import Counter

df['hashes'] = df['hashes'].apply(lambda x: sum(v == 1 for k, v in Counter(x).items()))
print (df)
              email  hashes
0  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2d585e485f6d48554c405d4148034e4240" rel="noreferrer noopener nofollow">[email protected]</a>       1

Pandas 唯一替代 DataFrame 构造函数和 DataFrame.nunique :

df['hashes'] = pd.DataFrame(df['hashes'].tolist(), index=df.index).nunique(axis=1)
print (df)
              email  hashes
0  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e095938592a08598818d908c85ce838f8d" rel="noreferrer noopener nofollow">[email protected]</a>       1

关于python-3.x - 如何获取 Pandas Dataframe 中列列表中非重复元素的计数？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66707404/

python-3.x - 如何获取 Pandas Dataframe 中列列表中非重复元素的计数？

上一篇：android - 深度链接在 Chrome 移动浏览器中不起作用

下一篇：http - 将工作的curl请求转换为HTTP