awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数

我有这样的文件:

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cea3afa0a7bda68eb7afa6a1a1e0ada1a3" rel="noreferrer noopener nofollow">[email protected]</a>
Rajesh<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="587628392c3d341830372c35393134763136" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="99f3f2f5d9fef4f8f0f5b7ecf2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eca2899bdddedfac999899c28d8fc28582" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3140465471565c50585d1f525e1f585f" rel="noreferrer noopener nofollow">[email protected]</a>

我想将每个域的出现次数计算为

Domain Name No of Email
-----------------------
com         1
in          3
uk          1

最佳答案

这是一个纯 POSIX awk解决方案(从sort程序内部调用awk):

awk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
    for (k in a) print k, a[k] | "sort"
  }
' file

如果您有GNU awk 4.0或更高，无需外部即可凑合sort甚至可以轻松地从 gawk 内部控制排序字段程序:

gawk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
     # First, control output sorting by setting the order in which 
     # the associative array will be looped over by, via the special
     # PROCINFO["sorted_in"] variable; e.g.:
     #  - Sort by top-level domain, ascending:  "@ind_str_asc"
     #  - Sort by occurrence count, descending: "@val_num_desc"
    PROCINFO["sorted_in"]="@ind_str_asc"
    for (k in a) print k, a[k]
  }
' file

关于awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/22800531/

awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数

上一篇：asp.net-mvc - 如何在asp.net mvc 5中添加角色和用户？

下一篇：haskell - 在 Haskell 中查找二叉树的值