python - 从 pd 数据帧获取联合概率

我有以下数据框

{'state': {7192: 'healthy',
  7193: 'healthy',
  7194: 'healthy',
  7195: 'Non healthy',
  7196: 'Non healthy'},
 'type': {7192: 'W', 7193: 'A', 7194: 'W', 7195: 'W', 7196: 'A'}}

我想要与此 df 相关的联合概率。

P(状态 = 健康，类型 = A)，P(状态 = 健康，类型 = W) P(状态 = 不健康，类型 = A)，P(状态 = 不健康，类型 = W)

我尝试了 groupby 方法，但没有成功。最有效的方法是什么。

编辑:为了澄清一点，我想计算每对(状态，类型)的出现次数。在上面的例子中，这应该是 P(状态 = 健康，类型 = A) = 1/5 , P(状态 = 健康，类型 = W) = 2/5 P(状态 = 不健康，类型 = A) = 1/5，P(状态 = 不健康，类型 = W) = 1/5

谢谢，

最佳答案

似乎您可以使用 DataFrame.value_counts(normalize=True) 来实现您想要的。请注意，DataFrame.value_counts 是 pandas >= 1.1.0 中的新功能。如果您使用的是旧版本，则可以使用不同的方法获得相同的结果。

首先将您的字典转换为 pd.DataFrame:

df = pd.DataFrame(data)

Pandas 版本 >= 1.1.0

probs = df.value_counts(["state", "type"], normalize=True)

print(probs)
healthy      W       0.4
             A       0.2
Non healthy  W       0.2
             A       0.2

# Select individual probabilitiy:
healthy_a_prob = probs[("healthy", "A")]

print(healthy_a_prob)
0.2

如果您的 pandas 版本早于 1.1.0，请将上例中的第一行替换为:

probs = df.groupby("state")["type"].value_counts() / len(df)

# rest is the exact same

如果您想要交叉制表概率表，我建议使用 pd.crosstab 和 normalize=True:

crosstab_ptable = pd.crosstab(df["state"], df["type"], normalize=True)

print(crosstab_ptable)
type           A    W
state
Non healthy  0.2  0.2
healthy      0.2  0.4

如果您也对边际概率感兴趣，可以使用 margins 参数:

crosstab_ptable = pd.crosstab(df["state"], df["type"], margins=True, normalize=True)

print(crosstab_ptable)
type           A    W  All
state
Non healthy  0.2  0.2  0.4
healthy      0.2  0.4  0.6
All          0.4  0.6  1.0

关于python - 从 pd 数据帧获取联合概率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64652945/

python - 从 pd 数据帧获取联合概率

上一篇：javascript - 如何从自定义 react 钩子(Hook)丰富数据

下一篇：flutter - 如何使用 FutureBuilder Flutter 叠加并居中圆形进度指示器