awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数

标签 awk associative-array

我有这样的文件:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cea3afa0a7bda68eb7afa6a1a1e0ada1a3" rel="noreferrer noopener nofollow">[email protected]</a>
Rajesh<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="587628392c3d341830372c35393134763136" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="99f3f2f5d9fef4f8f0f5b7ecf2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eca2899bdddedfac999899c28d8fc28582" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3140465471565c50585d1f525e1f585f" rel="noreferrer noopener nofollow">[email protected]</a>

我想将每个域的出现次数计算为

Domain Name No of Email
-----------------------
com         1
in          3
uk          1

最佳答案

这是一个纯 POSIX awk解决方案(从sort程序内部调用awk):

awk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
    for (k in a) print k, a[k] | "sort"
  }
' file

如果您有GNU awk 4.0或更高,无需外部即可凑合sort甚至可以轻松地从 gawk 内部控制排序字段程序:

gawk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
     # First, control output sorting by setting the order in which 
     # the associative array will be looped over by, via the special
     # PROCINFO["sorted_in"] variable; e.g.:
     #  - Sort by top-level domain, ascending:  "@ind_str_asc"
     #  - Sort by occurrence count, descending: "@val_num_desc"
    PROCINFO["sorted_in"]="@ind_str_asc"
    for (k in a) print k, a[k]
  }
' file

关于awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22800531/

相关文章:

regex - 如何使用awk从二进制文件中提取版本号

awk - golang 命令即使手动执行也不起作用

bash - 如何使用 awk 将混合/部分不存在的记录提取到定义的顺序

javascript - 如何组合关联数组中的重复项?

使用命名占位符时 PHP/SQL 插入错误

linux - 如果行匹配,如何在目标文件中标记行

linux - 拆分文本文件并根据第一列重命名

for循环期间的Javascript关联数组修改

Javascript 关联数组

javascript - 为什么关联数组在 localStorage [""中不起作用]?