我有这样的文件:
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cea3afa0a7bda68eb7afa6a1a1e0ada1a3" rel="noreferrer noopener nofollow">[email protected]</a>
Rajesh<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="587628392c3d341830372c35393134763136" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="99f3f2f5d9fef4f8f0f5b7ecf2" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eca2899bdddedfac999899c28d8fc28582" rel="noreferrer noopener nofollow">[email protected]</a>
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3140465471565c50585d1f525e1f585f" rel="noreferrer noopener nofollow">[email protected]</a>
我想将每个域的出现次数计算为
Domain Name No of Email
-----------------------
com 1
in 3
uk 1
最佳答案
这是一个纯 POSIX awk
解决方案(从sort
程序内部调用awk
):
awk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }
END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
for (k in a) print k, a[k] | "sort"
}
' file
如果您有GNU awk 4.0
或更高,无需外部即可凑合sort
甚至可以轻松地从 gawk
内部控制排序字段程序:
gawk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }
END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
# First, control output sorting by setting the order in which
# the associative array will be looped over by, via the special
# PROCINFO["sorted_in"] variable; e.g.:
# - Sort by top-level domain, ascending: "@ind_str_asc"
# - Sort by occurrence count, descending: "@val_num_desc"
PROCINFO["sorted_in"]="@ind_str_asc"
for (k in a) print k, a[k]
}
' file
关于awk - 如何使用 awk 命令使用关联数组来计算文件中特定字符的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/22800531/