我有一个 pandas 系列,内容如下:
(['StartGame', 'TutorialEnded', 'FBConnect',
'StartGame', 'Sale', 'FBConnect', 'InviteSent',
'StartGame', 'Finish_1', 'Sale', 'Bought',
'Finish_22', 'FBConnect', 'Finish_2',
'TutorialEnded', 'Finish_18', ...])
我想绘制包含字符串 Finish
的值与值 sale
的外观之间的距离,以查看两者之间是否存在任何相关性,如下所示并检查与 sale
相关的其他单词的出现之间的相关性。换句话说,我可以使用系列中任何值的出现来预测附近销售
的发生吗?即使绘制一条散点线,为每个值分配不同的颜色,这样我就能感觉到它会很有帮助,但我不知道该怎么做。
最佳答案
设置
df = pd.DataFrame(['StartGame', 'TutorialEnded', 'FBConnect',
'StartGame', 'Sale', 'FBConnect', 'InviteSent',
'StartGame', 'Finish_1', 'Sale', 'Bought',
'Finish_22', 'FBConnect', 'Finish_2',
'TutorialEnded', 'Finish_18'], columns=['Value'])
df.index.name = 'position'
df.reset_index(inplace=True)
辅助函数
def isFinish(x):
"""Returns True if Value matches 'Finish', False otherwise."""
return bool(re.match(r'.*Finish.*', x.ix['Value']))
def isSale(x):
"""Returns True if Value matches 'Sale', False otherwise."""
return bool(re.match(r'.*Sale.*', x.ix['Value']))
df['Finish'] = df.apply(isFinish, axis=1)
df['Sale'] = df.apply(isSale, axis=1)
df['FinishCount'] = df.Finish.cumsum()
def cumargmax(x):
"""get latest position of a Finish row."""
if x.ix['FinishCount'] == 0:
return np.nan
else:
return df.FinishCount.loc[:x.ix['position']].argmax()
df['Distance'] = df.position - df.apply(cumargmax, axis=1)
演示
print df
position Value Finish Sale FinishCount Distance
0 0 StartGame False False 0 NaN
1 1 TutorialEnded False False 0 NaN
2 2 FBConnect False False 0 NaN
3 3 StartGame False False 0 NaN
4 4 Sale False True 0 NaN
5 5 FBConnect False False 0 NaN
6 6 InviteSent False False 0 NaN
7 7 StartGame False False 0 NaN
8 8 Finish_1 True False 1 0.0
9 9 Sale False True 1 1.0
10 10 Bought False False 1 2.0
11 11 Finish_22 True False 2 0.0
12 12 FBConnect False False 2 1.0
13 13 Finish_2 True False 3 0.0
14 14 TutorialEnded False False 3 1.0
15 15 Finish_18 True False 4 0.0
或按促销时间划分子集
print df[df.Sale]
position Value Finish Sale FinishCount Distance
4 4 Sale False True 0 NaN
9 9 Sale False True 1 1.0
关于python - 映射/绘制一维数组/系列中值的距离,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37121810/