我有一个数据集,其中包含客户 ID 和名为“WEEK1”、“WEEK2”等的指标。如果客户在该特定周注册,则值为 1,否则为 0,如下所示:
ID WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
1 0 0 1 0 1
2 0 0 0 0 1
3 1 0 1 0 1
4 0 0 0 0 0
5 1 1 1 1 1
6 1 0 0 0 0
7 0 1 1 1 0
我想做的是搜索客户注册的第一周,保持该周的指标 = 1,并将该客户 ID 的所有其他周指标值更改为 0,即 O/P :-
ID WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
1 0 0 1 0 0 ## WEEK5 is changed to 0 here
2 0 0 0 0 1 ## nothing changed
3 1 0 0 0 0 ## WEEK3 and WEEK5 is changed to 0
4 0 0 0 0 0
5 1 0 0 0 0
6 1 0 0 0 0
7 0 1 0 0 0
因此对于每个客户 ID,我们找到第一个 WEEK 的值为 1,然后将所有下一个 WEEK 的值设为 0。
现在我已经使用 if-else 尝试了这个,如下所示将每个条件一一放置:
if df['WEEK1'] == 1:
df['WEEK2'] = 0
df['WEEK3'] = 0
df['WEEK4'] = 0
df['WEEK5'] = 0
elif df['WEEK2'] == 1:
df['WEEK3'] = 0
df['WEEK4'] = 0
df['WEEK5'] = 0
... and so on
当只有 5 个 WEEK 列时,使用 if-else 对我有用,但现在我正在获取包含 52 个 WEEK 列的数据,除了使用 if-else 之外,我找不到任何替代方法。
因此,任何适用于在这 5 列上施加层次结构并且还可以扩展到可变数量的列(如 52、104 等)的任何东西都会很有帮助。
最佳答案
使用:
#if first column is not index
df = df.set_index('ID')
df = df.where(df.shift(axis=1).eq(1).cumsum(axis=1).eq(0), 0)
print (df)
WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
ID
1 0 0 1 0 0
2 0 0 0 0 1
3 1 0 0 0 0
4 0 0 0 0 0
5 1 0 0 0 0
6 1 0 0 0 0
7 0 1 0 0 0
详细说明:
第一个DataFrame.shift
右边的值:
print (df.shift(axis=1))
WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
ID
1 NaN 0.0 0.0 1.0 0.0
2 NaN 0.0 0.0 0.0 0.0
3 NaN 1.0 0.0 1.0 0.0
4 NaN 0.0 0.0 0.0 0.0
5 NaN 1.0 1.0 1.0 1.0
6 NaN 1.0 0.0 0.0 0.0
7 NaN 0.0 1.0 1.0 1.0
比较 1
如果可能的话另一个值如 1
或 0
,否则省略此步骤:
print (df.shift(axis=1).eq(1))
WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
ID
1 False False False True False
2 False False False False False
3 False True False True False
4 False False False False False
5 False True True True True
6 False True False False False
7 False False True True True
通过 DataFrame.cumsum
获取每行的累计和:
print (df.shift(axis=1).eq(1).cumsum(axis=1))
WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
ID
1 0 0 0 1 1
2 0 0 0 0 0
3 0 1 1 2 2
4 0 0 0 0 0
5 0 1 2 3 4
6 0 1 1 1 1
7 0 0 1 2 3
与 0
比较:
print (df.shift(axis=1).eq(1).cumsum(axis=1).eq(0))
WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
ID
1 True True True False False
2 True True True True True
3 True False False False False
4 True True True True True
5 True False False False False
6 True False False False False
7 True True False False False
通过掩码 False
将最后设置的值设置为 0
by DataFrame.where
:
print (df.where(df.shift(axis=1).eq(1).cumsum(axis=1).eq(0), 0))
WEEK1 WEEK2 WEEK3 WEEK4 WEEK5
ID
1 0 0 1 0 0
2 0 0 0 0 1
3 1 0 0 0 0
4 0 0 0 0 0
5 1 0 0 0 0
6 1 0 0 0 0
7 0 1 0 0 0
关于python - 在多列上施加层次结构,根据其他列更改列值的随机数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57006117/