使用 Pandas 数据读取器进行 Python 回归建模

标签 python pandas numpy matplotlib

我正在尝试构建一个函数,该函数将提取任何股票的数据,然后绘制回归图。但是,我遇到了源数据问题。我的问题是 - 如何在 pandas 数据框中获取时间序列并绘制随时间变化的线性趋势?我的代码如下:

此代码将产生回归:

import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
rng = np.random.RandomState(1)
x = 10 * rng.rand(50)
y = 2 * x - 5 + rng.randn(50)
plt.scatter(x, y);
plt.show()
from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)

model.fit(x[:, np.newaxis], y)

xfit = np.linspace(0, 10, 1000)
yfit = model.predict(xfit[:, np.newaxis])

plt.scatter(x, y)
plt.plot(xfit, yfit);
plt.show()

这是我尝试通过数据框传递数据

from datetime import datetime
import pandas_datareader.data as web

start = datetime(2017, 8, 1)
end = datetime(2018, 7, 30)
data_SP = web.DataReader('JPM', 'iex', start, end)

y = dates # not sure how to get here?
plt.scatter(data['close'], y);
plt.show()

from sklearn.linear_model import LinearRegression
model = LinearRegression(fit_intercept=True)

model.fit(data['close'][:, np.newaxis], y)

xfit = np.linspace(0, 10, 1000)
yfit = model.predict(xfit[:, np.newaxis])

plt.scatter(data['close'], y)
plt.plot(xfit, yfit);
plt.show()

最佳答案

回归不能采用日期时间对象,必须转换为数字类型:

import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
from datetime import datetime
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from sklearn.linear_model import LinearRegression
import pandas_datareader.data as web

start = datetime(2017, 8, 1)
end = datetime(2018, 7, 30)
data_SP = web.DataReader('JPM', 'iex', start, end)

dates = list(map(lambda x: datetime.strptime(x,"%Y-%m-%d"),list(data_SP.index)))
days_since = list(map(lambda x: (x-start).days,dates))

model = LinearRegression(fit_intercept=True)
model.fit(np.array(days_since)[:, np.newaxis],data_SP['close'])

yfit = model.predict(np.array(days_since)[:, np.newaxis])

plt.figure()
plt.scatter(dates, yfit)
plt.scatter(dates, data_SP['close'])
plt.xlabel('date')
plt.ylabel('close')
plt.show()

如果使用百分比变化,则需要考虑烦人的 NaN。

import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as np
from datetime import datetime
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
from sklearn.linear_model import LinearRegression
import pandas_datareader.data as web

start = datetime(2017, 8, 1)
end = datetime(2018, 7, 30)
data_SP = web.DataReader('JPM', 'iex', start, end)

dates = list(map(lambda x: datetime.strptime(x,"%Y-%m-%d"),list(data_SP.index)))
days_since = list(map(lambda x: (x-start).days,dates))

model = LinearRegression(fit_intercept=True)
model.fit(np.array(days_since)[1:][:, np.newaxis],data_SP['close'].pct_change(1)[1:]) # <------------

yfit = model.predict(np.array(days_since)[:, np.newaxis])

plt.figure()
plt.scatter(dates, yfit)
plt.scatter(dates, data_SP['close'].pct_change(1))
plt.xlabel('date')
plt.ylabel('close')
plt.show()

percent change

关于使用 Pandas 数据读取器进行 Python 回归建模,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51575927/

相关文章:

python - 多个列表之间的独特功能

python - "np.cumsum() like"- 根据实际值迭代

python - 重新分配一个巨大的列表/数组会导致内存泄漏吗?

python - 如何在 Numpy 中将索引数组转换为掩码数组?

python - 在两个定点表示之间转换

python - 对音频流使用多处理

python - 抑制 PyDev 中无法访问的错误?

python - 将 RandomForestClassifier Predict_Proba 结果添加到原始 Dataframe

django - 在 django 中安装 pandas

python - 如何以年份为频率创建 Pandas DatetimeIndex?