python - 如何使用分类变量和数值变量绘制相关矩阵/热图

我有 4 个变量，其中 2 个是名义变量(dtype=object)，2 个是数字变量(dtypes=int 和 float)。

df.head(1)

OUT:
OS_type|Week_day|clicks|avg_app_speed
iOS|Monday|400|3.4

现在，我想将数据框放入 seaborn 热图可视化中。

import numpy as np
import seaborn as sns
ax = sns.heatmap(df)

但我收到一条错误消息，指出我不能使用分类变量，只能使用数字。我如何正确处理它然后将其反馈到热图中？

最佳答案

要绘制的热图需要介于 0 和 1 之间的值。对于数值变量之间的相关性，您可以使用 Pearson's R，对于分类变量(校正后的)Cramer's V，对于分类变量和数值变量之间的相关性，您可以使用相关率.

至于创建分类变量的数字表示，有多种方法可以做到这一点:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv('some_source.csv')  # has categorical var 'categ_var'

# method 1: uses pandas
df['numerized1'] = df['categ_var'].astype('category').cat.codes

# method 2: uses pandas, sorts values descending by frequency
df['numerized2'] = df['categ_var'].apply(lambda x: df['categ_var'].value_counts().index.get_loc(x))

# method 3: uses sklearn, result is the same as method 1
lbl = LabelEncoder()
df['numerized3'] = lbl.fit_transform(df['categ_var'])

# method 4: uses pandas; xyz captures a list of the unique values 
df['numerized4'], xyz = pd.factorize(df['categ_var'])

关于python - 如何使用分类变量和数值变量绘制相关矩阵/热图，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57280472/

上一篇：google-cloud-storage - 与 Google Cloud Storage 的连接池

下一篇：opayo - 3D Secure - test.sagepay.com 返回空白页

python - 简单的网络爬虫我需要消除数组中存在的重复 URL

python - 如果列包含单词，则使用找到的值生成一个新列

python - 为什么我在 matplotlib 中的绘图没有显示轴

python - 使用 ScikitLearn 的多元线性回归，不同的方法给出不同的答案

r - 怎么办: Correlation with "blocks" (or - "repeated measures" ?!)？

r - 在 R 中使用 "~ call"和动态变量

python - 为图像着色 Pygame

python - 将 numpy 数组添加到 scipy.sparse.dok_matrix

python - Pandas Python - 为多个条件创建虚拟变量