python - 标准化 : how to avoid zero standard deviation

有以下任务:

Normalize the matrix by columns. From each value in column subtract average (in column) and divide it by standard deviation (in the column). Your output should not contain nan (caused by division by zero). Replace Nans with 1. Don't use if and while/for.

我正在使用 numpy，所以我编写了以下代码:

def normalize(matrix: np.array) -> np.array:
    res = (matrix - np.mean(matrix, axis = 0)) / np.std(matrix, axis = 0, dtype=np.float64)
    return res
matrix = np.array([[1, 4, 4200], [0, 10, 5000], [1, 2, 1000]])
assert np.allclose(
    normalize(matrix),
    np.array([[ 0.7071, -0.39223,  0.46291],
              [-1.4142,  1.37281,  0.92582],
              [ 0.7071, -0.98058, -1.38873]])
)

答案是正确的。

但是，我的问题是:如何避免被零除？如果我有一列相似的数字，我将得到标准差 = 0 和结果中的 Nan 值。我该如何解决？将不胜感激!

最佳答案

您的任务指定避免输出中的 nan 并将出现的 nan 替换为 1。它没有指定中间结果可能不包含 nan。 一个有效的解决方案是在返回之前在 res 上使用 numpy.nan_to_num:

import numpy as np
def normalize(matrix: np.array) -> np.array:
    res = (matrix - np.mean(matrix, axis = 0)) / np.std(matrix, axis = 0, dtype=np.float64)
    return np.nan_to_num(res, False, 1.0)
matrix = np.array([[2, 4, 4200], [2, 10, 5000], [2, 2, 1000]])
print(normalize(matrix))

产量:

[[ 1.         -0.39223227  0.46291005]
 [ 1.          1.37281295  0.9258201 ]
 [ 1.         -0.98058068 -1.38873015]]

关于python - 标准化 : how to avoid zero standard deviation，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/60283097/

上一篇：python - 如何构建Python程序？尝试使其更加结构化，现在运行速度慢了 13 倍

下一篇：Python pandas 从嵌入 Web txt 文件中的 csv 创建数据框

相关文章：

python - 使用 NLTK 删除停用词时对象没有属性

python - 基于共同值连接 2 个数组/列表的简单方法

python - Numpy "double"-广播 - 有可能吗？

python - 使用 argsort 进行 Numpy 索引

python - numpy.ndarray 稀疏矩阵到密集

python - 为什么在 Python 中，在另一个日期之后生成的日期可能会出现在前一个日期之前？

Python ctypes 如何跳过可选参数

python - 从数据框中的组中删除在特定列中具有最小值的行

python - 用 numpy tensordot 进行张量乘法

python - 为什么在 .vim 文件中编写 Python 时 "\n"会变成 "^@"？