python - 如何使用 Scikit-learn Standard Scaler 对时间序列数据进行标准化？

标签 python scikit-learn keras time-series data-processing

我正在使用Keras ，所以数据的形状是(batch_size，timesteps，input_dim)。和Standard Scaler正好适合二维数据。

我认为的一个解决方案是使用部分拟合，然后进行变换。

scaler = StandardScaler()
for sample in range(data.shape[0]):
    scaler.partial_fit(data[sample])

for sample in range(data.shape[0]):
    data[sample] = scaler.transform(data[sample])

这是正确/有效的方法吗？

最佳答案

你有两种可能性

data = np.random.randn(batch_size*time_length*nb_feats).reshape((bsize,time,feats))

版本 1 正在执行您所说的操作:

scaler = StandardScaler()
for sample in range(data.shape[0]):
    scaler.partial_fit(data[sample])

for sample in range(data.shape[0]):
    data[sample] = scaler.transform(data[sample])

另一种可能性(版本 2)是展平数组、拟合和变换，然后 reshape 它

scaler = StandardScaler()
data   = scaler.fit_transform(data.reshape((bsize*time,feats))).reshape((bsize,time,feats))

在我的电脑

版本 1 需要 0.8759770393371582 秒

版本 2 需要 0.11733722686767578 秒

关于python - 如何使用 Scikit-learn Standard Scaler 对时间序列数据进行标准化？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53075203/

上一篇：python - 将 FreeTDS 与 Django 结合使用

下一篇：python - Pandas groupby : change values in one column based on values in another column

python - 如何访问管道中包含的模型中的最佳估计器参数？

python - 使用 scikit-learn 计算精度时出现 ValueError

machine-learning - ValueError : The name "Sequential" is used 4 times in the model. 所有图层名称都应该是唯一的吗？

python - CNN架构: classifying "good" and "bad" images

python - Matplotlib:如何将时间戳与 broken_barh 一起使用？

python - 在 Django View 中的 render() 中发送图像作为参数

python - keras 准确率提高不超过 59%

python - 调用时更新对象变量

docker - 使用docker在本地安装Jupyter Notebook:额外命令