我正在尝试根据其他列的行中的值更改在新列中分配值。请引用给出的数据集。
ID1-基于差异列,每当值不等于1时,它应该分配新的ID,在上一行的ID上加一。
ID2-当 ID1 内的区域发生变化时分配 ID
ID3-分配ID1和ID2中的ID
当 Indv 列更改为新值时,以上三个 ID 都应从 1 开始。
import pandas as pd
# intialise data of lists.
data={'Indv':[1,1,1,1,1,1,1,1,1,1,1,2,2,2],
'Region':['A','A','A','A','B','B','B','C','C','C','D','A','A','C'],
'diff':[ 1,1,10,1,1,1,1,10,1,1,1,-11,1,1],
}
#CreateDataFrame
df=pd.DataFrame(data)
#creating ID1
df['ID1']=1
#Code only for ID1
for i in range(len(df)):
j=i+1
if(df['Indv'][i]!=df['Indv'][j]):
df['session_ID'][j]=1
if df['diff'][j]==1:
df['ID1'][j]=df['ID1'][i]
else:
df['ID1'][j]=df['ID1'][i]+1
break;
具有预期结果的数据集 - 需要生成 ID1、ID2 和 ID3 列。
Indv, Region, diff, ID1, ID2, ID3
1, A, 1, 1, 1, 1
1, A, 1, 1, 1, 2
1, A, 10, 2, 1, 1
1, A, 1, 2, 1, 2
1, B, 1, 2, 2, 1
1, B, 1, 2, 2, 2
1, B, 1, 2, 2, 3
1, C, 10, 3, 1, 1
1, C, 1, 3, 1, 2
1, C, 1, 3, 1, 3
1, D, 1, 3, 2, 1
2, A, -11, 1, 1, 1
2, A, 1, 1, 1, 2
2, C, 1, 1, 2, 1
最佳答案
这是我的解决方案:
- 创建数据框
data={'Indv':[1,1,1,1,1,1,1,1,1,1,1,2,2,2],
'Region1':['A','A','A','A','B','B','B','C','C','C','D','A','A','C'],
'diff':[ 1,1,10,1,1,1,1,10,1,1,1,-11,1,1]
}
df = pd.DataFrame(data)
- 声明用于查找 id1 和 id2 的函数:
def createId1(group):
cumsum = group.ne(1).cumsum()
if cumsum.iloc[0] == 0:
return cumsum + 1
return cumsum
def createId2(group):
return group.ne(group.shift(1)).cumsum()
- 创建 ID 列
df["id1"] = df.groupby(["Indv"])["diff"].transform(lambda group: createId1(group))
df["id2"] = df.groupby(["Indv", "id1"])["Region1"].transform(lambda group: createId2(group))
df["id3"] = df.groupby(["Indv", "id1", "id2"]).cumcount()+1
输出:
print(df.to_string())
Indv Region1 diff id1 id2 id3
0 1 A 1 1 1 1
1 1 A 1 1 1 2
2 1 A 10 2 1 1
3 1 A 1 2 1 2
4 1 B 1 2 2 1
5 1 B 1 2 2 2
6 1 B 1 2 2 3
7 1 C 10 3 1 1
8 1 C 1 3 1 2
9 1 C 1 3 1 3
10 1 D 1 3 2 1
11 2 A -11 1 1 1
12 2 A 1 1 1 2
13 2 C 1 1 2 1
Documentation:
DataFrame.groupby: group rows based on a mapper (here I used one or several series).
GrouBy.transform: apply a function on each groups (GroupBy.apply would have worked too).
Series.ne: return a series of boolean based on non equality element wise of a value.
Series.shift: shift the index of a series by a given step.
DataFrame.cumsum: return the cumulative sum of the Series. When applied on boolean Series return the cumulative sum of True values encountered.
GroupBy.cumcount: Number each item in a group starting at 0.
关于python - 根据多个 IF 条件使用新 ID 创建列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57586002/