linux - 多个文件的平均值

我正在尝试编写一个 shell 脚本来平均几个格式相同的文件，这些文件的名称为 file1、file2、file3 等等。

在每个文件中，数据都在一个格式的表格中，例如 4 列和 5 行数据。假设 file1、file2 和 file3 在同一目录中。我想要做的是创建一个平均文件，它具有与 file1/file2/file3 相同的格式，它应该有平均表中的每个元素。例如，

{(Element in row 1, column 1 in file1)+
 (Element in row 1, column 1 in file2)+
 (Element in row 1, column 1 in file3)} >> 
(Element in row 1, column 1 in average file)

同样，我需要为表中的每个元素执行此操作，平均文件具有与 file1、file2、file3 相同数量的元素。

我尝试编写一个 shell 脚本，但它不起作用。我想要的是循环读取文件并从每个文件中 grep 相同的元素，将它们相加并根据文件数量进行平均，最后写入类似的文件格式。这是我试图写的:

#!/bin/bash       
s=0
for i in {1..5..1} do
    for j in {1..4..1} do
        for f in m* do
            a=$(awk 'FNR == i {print $j}' $f)
            echo $a
            s=$s+$a
            echo $f
        done
        avg=$s/3
        echo $avg > output
    done
done

最佳答案

这是一种相当低效的处理方式:对于您尝试提取的每一个数字，您都会完全处理一个输入文件——即使您只有三个文件，您也会处理 60 个!

此外，以这种方式混合 Bash 和 awk 是一种巨大的反模式。 This here是解释原因的很好的问答。

补充几点:

对于大括号扩展，默认步长为 1，因此 {1..4..1} 与 {1..4} 相同。
Awk 不知道i 和j 是什么。就它而言，这些从未被定义。如果您真的想将您的 shell 变量放入 awk，您可以这样做
```
a=$(awk -v i="$i" -v j="$j" 'FNR == i { print $j }' $f)
```
但这种方法无论如何都不合理。
Shell 算法不像 s=$s+$a 或 avg=$s/3 那样工作——它们只是连接字符串。要让 shell 为您进行计算，您需要进行算术扩展:
```
s=$(( s + a ))
```
或者，更短一点，
```
(( s += a ))
```
和
```
avg=$(( s / 3 ))
```
请注意，在算术上下文中不需要 $ 符号。
echo $avg > output 会将每个数字打印在单独的一行上，这可能不是您想要的。
缩进很重要!如果不是为了机器，那就是为了人类读者。

Bash 解决方案

这只使用 Bash 就解决了这个问题。它被硬编码为三个文件，但在行数和每行元素数方面是灵活的。没有检查来确保所有行和文件的元素数量相同。

请注意，Bash 在处理这类事情时不快，并且应该只用于小文件，如果有的话。此外，它使用整数运算，因此 3 和 4 的“平均值”将变为 3。

我添加了评论来解释发生了什么。

#!/bin/bash

# Read a line from the first file into array arr1
while read -a arr1; do

    # Read a line from the second file at file descriptor 3 into array arr2
    read -a arr2 <&3

    # Read a line from the third file at file descriptor 4 into array arr3
    read -a arr3 <&4

    # Loop over elements
    for (( i = 0; i < ${#arr1[@]}; ++i )); do

        # Calculate average of element across files, assign to res array
        res[i]=$(( (arr1[i] + arr2[i] + arr3[i]) / 3 ))
    done

    # Print res array
    echo "${res[@]}"

# Read from files supplied as arguments
# Input for the second and third file is redirected to file descriptors 3 and 4
# to enable looping over multiple files concurrently
done < "$1" 3< "$2" 4< "$3"

这必须像这样调用

./bashsolution file1 file2 file3

并且可以根据需要重定向输出。

awk 解决方案

这是纯 awk 中的解决方案。它更灵活一些，因为它取了作为参数提供的许多文件的平均值；它也应该比 Bash 解决方案快大约一个数量级。

#!/usr/bin/awk -f

# Count number of files: increment on the first line of each new file
FNR == 1 { ++nfiles }

{
    # (Pseudo) 2D array summing up fields across files
    for (i = 1; i <= NF; ++i) {
        values[FNR, i] += $i
    }
}

END {
    # Loop over lines of array with sums
    for (i = 1; i <= FNR; ++i) {

        # Loop over fields of current line in array of sums
        for (j = 1; j <= NF; ++j) {

            # Build record with averages
            $j = values[i, j]/nfiles
        }
        print
    }
}

必须这样称呼

./awksolution file1 file2 file3

并且，如前所述，对要平均的文件数量没有限制。

关于linux - 多个文件的平均值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37061754/

linux - 多个文件的平均值

Bash 解决方案

awk 解决方案

上一篇：linux - 将文本提取到新文件中

下一篇：php - 恶意软件上传到服务器