这是我到目前为止的代码
import pandas as pd
from io import StringIO
data = StringIO("""
"name1","hej","7aa","a"
"name1","du","71al","a"
"name1","aj","74a","a"
"name1","oj","7aj","a"
"name2","fin","7ag","a"
"name2","katt","7a","a"
""")
df = pd.read_csv(data, header=0, names=["name","text2","text","as"])
df[['text2','text','as']] = df.groupby(['name']).transform(lambda
x: ','.join(x))
df = df[['name','text','text2','as']].drop_duplicates()
df
帮助我完成大部分工作。
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a,a,a
3 name2 7ag,7a fin,katt a,a
我只需要一行来检查每个列 ['text','text2','as'],如果所有逗号分隔的元素都相同,则仅返回第一个
所以我想要的结果是
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a
3 name2 7ag,7a fin,katt a
我尝试过使用 split(',') 来应用。无法让它工作。
我将这篇文章添加到第一条评论中。我未能正确描述我的问题
如果我的 df 是这样的:
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a,b,a
3 name2 7ag,7a fin,katt a,a
我需要将其修改为:
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a,b,a
3 name2 7ag,7a fin,katt a
不是:
df
name text text2 as
0 name1 71al,74a,7aj du,aj,oj a,b
3 name2 7ag,7a fin,katt a
谢谢
这是我的最终解决方案:
data = StringIO("""
"name1","hej","7aa","a"
"name1","du","71al","b"
"name1","aj","74a","a"
"name1","oj","7aj","a"
"name2","fin","7ag","a"
"name2","katt","7a","a"
""")
df = pd.read_csv(data, header=0, names=["name","text2","text","as"])
df[['text2','text','as']] = df.groupby(['name']).transform(lambda x: ','.join(x))
df = df[['name','text','text2','as']].drop_duplicates()
for col in df.columns:
df[col] = df[col].str.split(',').map(lambda x: ','.join(set(x) if len(set(x)) == 1 else x))
df
我不得不诉诸迭代。我无法使用 agg 获得所需的结果。此外,如果有人可以向我解释 len(set(x)) == 1 在这里,那将非常感激(由于逗号,它至少应该是 2 吗?)
最佳答案
尝试
df['as'] = df['as'].str.split(',').map(lambda x: ','.join(set(x) if len(set(x)) == 1 else x))
关于python-3.x - 如果所有星星相同,则减少 df 中逗号分隔的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58705689/