string - 线路操作和排序

我擅长编写 Linux 脚本，但需要一些建议。我知道这个问题有点模糊，所以如果您能提供任何帮助，我将不胜感激!

以下问题是为了个人成长，因为我正在编写一些网络工具来娱乐/学习。不涉及家庭作业(我是大学四年级学生，我的类(class)都不需要这些东西!)

我正在使用 tshark 来获取有关数据包捕获的信息。它看起来像这样:

rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs

===================================================================
Protocol Hierarchy Statistics
Filter: 

eth                                      frames:4309 bytes:3984321
  ip                                     frames:4119 bytes:3969006
    icmp                                 frames:1316 bytes:1308988
    udp                                  frames:1408 bytes:1350786
      data                               frames:1368 bytes:1346228
      dns                                frames:16 bytes:1176
      nbns                               frames:14 bytes:1300
      http                               frames:8 bytes:1596
      nbdgm                              frames:2 bytes:486
        smb                              frames:2 bytes:486
          mailslot                       frames:2 bytes:486
            browser                      frames:2 bytes:486
    tcp                                  frames:1395 bytes:1309232
      data                               frames:1300 bytes:1294800
      http                               frames:6 bytes:3763
        data-text-lines                  frames:2 bytes:324
        xml                              frames:2 bytes:3205
          tcp.segments                   frames:1 bytes:787
      nbss                               frames:34 bytes:5863
        smb                              frames:17 bytes:3047
          pipe                           frames:4 bytes:686
            lanman                       frames:4 bytes:686
        smb2                             frames:13 bytes:2444
      bittorrent                         frames:10 bytes:1709
        tcp.segments                     frames:2 bytes:433
          bittorrent                     frames:2 bytes:433
            bittorrent                   frames:1 bytes:258
        bittorrent                       frames:2 bytes:221
          bittorrent                     frames:2 bytes:221
  arp                                    frames:146 bytes:8760
  ipv6                                   frames:44 bytes:6555
    udp                                  frames:40 bytes:6211
      dns                                frames:18 bytes:1711
      dhcpv6                             frames:14 bytes:2114
      http                               frames:6 bytes:1014
      data                               frames:2 bytes:1372
    icmpv6                               frames:4 bytes:344
===================================================================

我希望它看起来像:

rachel@Ubuntu-1:~/PCAP$ tshark -r LargeTorrent.pcap -q -z io,phs

===================================================================
Protocol Hierarchy Statistics
Filter: 

Protocol                   Bytes
=====================================
eth                        984321
  ip                       3969006
    icmp                   1308988
    udp                    1350786
      data                 1346228
      dns                  1176
      nbns                 1300
      http                 1596
      nbdgm                486
        smb                486
          mailslot         486
            browser        486
    tcp                    1309232
      data                 1294800
      http                 3763
        data-text-lines    324
        xml                3205
          tcp.segments     787
      nbss                 5863
        smb                3047
          pipe             686
            lanman         686
        smb2               2444
      bittorrent           1709
        tcp.segments       433
          bittorrent       433
            bittorrent     258
        bittorrent         221
          bittorrent       221
  arp                      8760
  ipv6                     6555
    udp                    6211
      dns                  1711
      dhcpv6               2114
      http                 1014
      data                 1372
    icmpv6                 344
===================================================================

编辑:我将添加原始问题，以便理解所提供的(很好的)答案。

最初，我只想打印“leaves”的统计信息，因为 eth、ip 等都是父项，他们的统计信息对于我的目的来说不是必需的。此外，我不想使用只有空格来显示层次结构的可怕文本 block ，而是想删除 parent 的所有统计数据，并将它们显示为 child 背后的面包屑。

示例:

eth                                      frames:4309 bytes:3984321
  ip                                     frames:4119 bytes:3969006
    icmp                                 frames:1316 bytes:1308988
    udp                                  frames:1408 bytes:1350786
      data                               frames:1368 bytes:1346228
      dns                                frames:16 bytes:1176

应该变成

eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes

保留层次结构并避免打印无用的统计信息。

无论如何，Etan 批准的答案完美地解决了这个问题!对于那些与我水平相同但不确定如何在回答后继续进行的人，这将帮助您完成:

将给定脚本保存为 filename.awk 文件
将要操作的文本 block 保存为 filename.txt 文件
调用awk -f filename.awk filename.txt
可以选择将输出通过管道传输到文件 ( awk -f filename.awk filename.txt >> output.txt )

最佳答案

我最初认为您想要的输出可以使用此 awk 脚本来实现。 (我认为这可能可以做得更干净，但这似乎工作得很好。)

function entry() {
    # Don't want to print empty entries.
    if (ind[0]) {
        printf "%s", ind[0]
        for (i = 1; i <= ls; i++) {
            printf ":%s", ind[i]
        }
        split(b, a, /:/)
        printf " - %s %s\n", a[2], a[1]
    }
}

# Found our data marker. Note that and print the current line.
$1 == "Filter:" {d=1; print; next}
# Print lines until we see our data marker.
!d {print; next}
# Print empty lines.
!NF {print; next}
# Save our trailing line for later.
/===/ {suf=$0; next}

{
    # Save our previous indentation level.
    ls = s
    # Find our new indentation level (by where the first field starts).
    s = (match($0, /[^[:space:]]/)-1) / 2

    # If the current line is at or below the last indent level print the last line.
    if (s <= ls) {
        entry()
    }

    # Save the current line's byte count.
    b=$NF
    # Save the current line's field name.
    ind[s] = $1
}

END {
    # Print a final line if we had one.
    entry()
    # Print the suffix line if we have one.
    if (suf) {
        print suf
    }
}

在示例输入上，您将获得此输出。

===================================================================
Protocol Hierarchy Statistics
Filter:

eth:ip:icmp - 1308988 bytes
eth:ip:udp:data - 1346228 bytes
eth:ip:udp:dns - 1176 bytes
eth:ip:udp:nbns - 1300 bytes
eth:ip:udp:http - 1596 bytes
eth:ip:udp:nbdgm:smb:mailslot:browser - 486 bytes
eth:ip:tcp:data - 1294800 bytes
eth:ip:tcp:http:data-text-lines - 324 bytes
eth:ip:tcp:http:xml:tcp.segments - 787 bytes
eth:ip:tcp:nbss:smb:pipe:lanman - 686 bytes
eth:ip:tcp:nbss:smb2 - 2444 bytes
eth:ip:tcp:bittorrent:tcp.segments:bittorrent:bittorrent - 258 bytes
eth:ip:tcp:bittorrent:bittorrent:bittorrent - 221 bytes
eth:arp - 8760 bytes
eth:ipv6:udp:dns - 1711 bytes
eth:ipv6:udp:dhcpv6 - 2114 bytes
eth:ipv6:udp:http - 1014 bytes
eth:ipv6:udp:data - 1372 bytes
eth:ipv6:icmpv6:data - 344 bytes
===================================================================

不过，使用 sed 可能更容易处理您编辑以表明您想要的输出。

/Filter:/a \
Protocol                   Bytes \
=====================================
s/frames:[^ ]*//
s/               b/b/
s/bytes:\([^ ]*\)/\1/

最终得到输出。

===================================================================
Protocol Hierarchy Statistics
Filter:
Protocol                   Bytes
=====================================

eth                        3984321
  ip                       3969006
    icmp                   1308988
    udp                    1350786
      data                 1346228
      dns                  1176
      nbns                 1300
      http                 1596
      nbdgm                486
        smb                486
          mailslot         486
            browser        486
    tcp                    1309232
      data                 1294800
      http                 3763
        data-text-lines    324
        xml                3205
          tcp.segments     787
      nbss                 5863
        smb                3047
          pipe             686
            lanman         686
        smb2               2444
      bittorrent           1709
        tcp.segments       433
          bittorrent       433
            bittorrent     258
        bittorrent         221
          bittorrent       221
  arp                      8760
  ipv6                     6555
    udp                    6211
      dns                  1711
      dhcpv6               2114
      http                 1014
      data                 1372
    icmpv6                 344
===================================================================

关于string - 线路操作和排序，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29336829/

string - 线路操作和排序

上一篇：arrays - 为每个 bash 数组项插入新行

下一篇：bash - 无法通过正则表达式过滤bind -p的输出