我的数据集看起来像这样,
Col1 Col2 Col3
A 10 x1
B 100 x2
C 1000 x3
这就是我的输出,
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A 10 x1 Empty Empty Empty Empty Empty Empty
B 100 x2 Empty Empty Empty Empty Empty Empty
C 1000 x3 Empty Empty Empty Empty Empty Empty
A 10 x1 B 100 x2 Empty Empty Empty
B 100 x2 C 1000 x3 Empty Empty Empty
A 10 x1 B 100 x2 C 1000 x3
感谢本网站的帮助,这可以通过 -
来完成arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)
pd.DataFrame(arr)
但是如果数据集如下,
Col1 Col2 Col3 Structure
A 10 x1 1
B 100 x2 1
C 1000 x3 2
输出需要是这样的 -
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Answer
A 10 x1 Empty Empty Empty Empty Empty Empty No
B 100 x2 Empty Empty Empty Empty Empty Empty No
C 1000 x3 Empty Empty Empty Empty Empty Empty Yes
A 10 x1 B 100 x2 Empty Empty Empty Yes
B 100 x2 C 1000 x3 Empty Empty Empty No
A 10 x1 B 100 x2 C 1000 x3 No
这基本上是说 A 和 B 是"is",因为它们处于相同的结构中,而 C 本身是"is",因为它本身在结构中。 所有其他行(例如 A、B、ABC)均为“否”,因为它们不在同一结构中。如何获得上面想要的表格?
代码,
arr = list(itertools.chain.from_iterable(
[[j for i in el for j in i] for el in itertools.combinations(df.values.tolist(), i)]
for i in range(1, len(df)+1)
)
)
pd.DataFrame(arr)
给我这个输出,
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
A 10 x1 Empty Empty Empty Empty Empty Empty
B 100 x2 Empty Empty Empty Empty Empty Empty
C 1000 x3 Empty Empty Empty Empty Empty Empty
A 10 x1 B 100 x2 Empty Empty Empty
B 100 x2 C 1000 x3 Empty Empty Empty
A 10 x1 B 100 x2 C 1000 x3
如何将“答案”列添加到此输出以获得最终表格?
最佳答案
由于 DataFrame 的结构,我们知道当我们应用 itertools.combinations
时,Structure
列将首先显示在第三列中,然后每隔四个列显示一次以下栏目:
0 1 2 3 4 5 6 7 8 9 10 11
0 A 10 x1 1 None NaN None NaN None NaN None NaN
1 B 100 x2 1 None NaN None NaN None NaN None NaN
2 C 1000 x3 2 None NaN None NaN None NaN None NaN
3 A 10 x1 1 B 100.0 x2 1.0 None NaN None NaN
4 A 10 x1 1 C 1000.0 x3 2.0 None NaN None NaN
5 B 100 x2 1 C 1000.0 x3 2.0 None NaN None NaN
6 A 10 x1 1 B 100.0 x2 1.0 C 1000.0 x3 2.0
我们可以使用它来仅索引 Structure
列,检查它们是否包含组的所有成员,然后删除它们:
checker = df.groupby('Structure').size().to_dict()
def helper(row):
u = row[~row.isnull()].values
return (len(np.unique(u)) == 1) & (checker[u[0]] == len(u))
s = out[out.columns[3::4]].apply(helper, 1).replace({False: 'No', True: 'Yes'})
0 No
1 No
2 Yes
3 Yes
4 No
5 No
6 No
dtype: object
删除其他列并分配给 DataFrame:
out.drop(out.columns[3::4], 1).assign(final=s)
0 1 2 4 5 6 8 9 10 final
0 A 10 x1 None NaN None None NaN None No
1 B 100 x2 None NaN None None NaN None No
2 C 1000 x3 None NaN None None NaN None Yes
3 A 10 x1 B 100.0 x2 None NaN None Yes
4 A 10 x1 C 1000.0 x3 None NaN None No
5 B 100 x2 C 1000.0 x3 None NaN None No
6 A 10 x1 B 100.0 x2 C 1000.0 x3 No
关于python - Pandas csv itertools 组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51709412/