python - 通过 pandas 数据框和 numpy 进行迭代

我有 Java 背景，对 numpy 和 pandas 很陌生。我想将下面的伪代码翻译成python。

theta[0...D] - numpy
input[1...D][0...N-1] - Pandas data frame

伪代码:

mean = theta[0]
for(row = 0 to N-1)
     for(col = 1 to D)
          mean += theta[col] * input[row][col]

实现:

class simulator:
    theta = np.array([])
    stddev = 0

    def __init__(self, v_coefficents, v_stddev):
        self.theta = v_coefficents
        self.stddev = v_stddev

    def sim( self, input ):
        mean = self.theta[0]
        D = input.shape[0]
        N = input.shape[1]

        for index, row in input.iterrows():
            mean = self.theta[0]
            for i in range(D):
                mean += self.theta[i+1] *row['y']

我关心最后一行代码中的迭代: mean += self.theta[i+1] *row['y']。

最佳答案

由于您正在使用 NumPy，我建议将 pandas 数据帧提取为数组，然后我们就可以使用 theta 和提取的 input 版本> 两者都是数组。

因此，一开始我们会将数组设置为 -

input_arr = input.values

那么，伪代码的翻译将是 -

mean = theta[0]
for row in range(N):
    for col in range(1,D+1):    
        mean += theta[col] * input_arr[row,col]

执行求和归约，使用 NumPy 支持向量化运算和 broadcasting ，我们将得到简单的输出 -

mean = theta[0] + (theta[1:D+1]*input_arr[:,1:D+1]).sum()

这可以通过 np.dot 进一步优化作为矩阵乘法，就像这样 -

mean = theta[0] + np.dot(input_arr[:,1:D+1], theta[1:D+1]).sum()

请注意，如果您的意思是 input 的长度为 D-1，那么我们需要进行一些编辑:

循环代码将具有:input_arr[row,col-1]，而不是input_arr[row,col]。
矢量化代码将具有:input_arr 而不是 input_arr[:,1:D+1]。

基于 comments 的示例运行-

In [71]: df = {'y' : [1,2,3,4,5]}
    ...: data_frame = pd.DataFrame(df)
    ...: test_coefficients = np.array([1,2,3,4,5,6])
    ...: 

In [79]: input_arr = data_frame.values
    ...: theta = test_coefficients
    ...: 

In [80]: theta[0] + np.dot(input_arr[:,0], theta[1:])
Out[80]: 71

关于python - 通过 pandas 数据框和 numpy 进行迭代，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41932321/

python - 通过 pandas 数据框和 numpy 进行迭代

上一篇：python for循环只执行一次？

下一篇：Python 3 - 我是否正确使用了索引？