python - 从编译的 csv 读取时，df.corr 将所有数据显示为 1，即使存在不同的数据

我有一个 sp500 股票数据的编译数据框，我试图使用 df.corr() 查找其相关性，但当我将所有数据标记为具有“1”相关性时，它会将所有数据标记为具有“1”相关性运行该程序，当我使用热图可视化数据时，它会显示整个绿色图表，而此时应该存在许多不同的正相关和负相关。

使用 Python 3.6 和 Spyder

这是我正在使用的代码:

def visualize_data():
df = pd.read_csv('sp500_joined_closes.csv')
pd.options.display.float_format = '{:.5f}'.format
#df['AAPL'].plot()
#plt.show()
df_corr = df.corr() #creates a correlation table of our data frame.  Generates correlation values       
print(df_corr.head())

data1 = df_corr.values #gets inner values of our data frame 
fig1 = plt.figure() #specify our figures
ax1 = fig1.add_subplot(1,1,1) #defined axis 1 by 1 plot 1

heatmap1 = ax1.pcolor(data1, cmap=plt.cm.RdYlGn) #sets the color paramater of heat map (negative,neutral,positive)
fig1.colorbar(heatmap1)
ax1.set_xticks(np.arange(data1.shape[0]) + 0.5, minor=False) #sets x ticks for heat map, arranging ticks at every 0.5(half-mark)
ax1.set_yticks(np.arange(data1.shape[1]) + 0.5, minor=False) #sets y ticks for heat map
ax1.invert_yaxis() #removes random gap from the top of graph
ax1.xaxis.tick_top() #moves x axis ticks to the top (meant to look more like a table)

column_labels = df_corr.columns
row_labels = df_corr.index

ax1.set_xticklabels(column_labels)
ax1.set_yticklabels(row_labels)

plt.xticks(rotation=90)
heatmap1.set_clim(-1,1)
plt.tight_layout()
#plt.savefig("correlations.png", dpi = (300))
plt.show()

可视化数据()

有趣的是，我到处寻找有类似错误的人，但似乎找不到任何答案。难道股票代码可以被认为是绝对的，因此有些东西被扭曲了？老实说，我不太确定。

即使当我尝试绘制一家公司与 #df['AAPL'].plot() 和 #plt.show() 所看到的所有数据的相关性时 同样的事情发生在数据仅注册与所有数据的相关值 1.0000 的情况下。

我最初认为这是由于有效数字导致的舍入错误，因此我输入了 pd.options.display.float_format = '{:.5f}'.format 但这不起作用我仍然收到倾斜的相关性。

Here is a screenshot of the issue and the subsequent heat map

Here is a screenshot of part of the data, confirming that it isn't all the same or that is has become corrupted in some measure

最佳答案

问题在于通过 Google 财经 API 获取数据。将其中一个日期下载到 sp500 公司之一时似乎出现了错误，当我编译所有数据(包括少数缺失的日期)时，由于某种原因，它只能生成一行数据。这导致相关性为“1”，因为所有数据完全相同。我找到了具体日期并手动添加它们，现在程序按预期运行。谢谢。

关于python - 从编译的 csv 读取时，df.corr 将所有数据显示为 1，即使存在不同的数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48834794/

python - 从编译的 csv 读取时，df.corr 将所有数据显示为 1，即使存在不同的数据

上一篇：python - 按下按钮时打印 LineEdit 文本

下一篇：python - 如何在Python中检查两个单词是否相邻？