python - 使用 Scapy 从 pcap 文件读取 session 可以提高内存效率吗

目前，我正在尝试编写一个快速的 Python 程序，该程序读取 .pcap 文件并写出有关存储在其中的各种 session 的数据。

我写出的信息有srcip、dstip、srcport、dstport等。

但是，即使对于相当小的 pcap，这也会占用大量内存并最终运行很长时间。我们谈论的是 8GB 以上的内存用于大小为 212MB 的 pcap。

像往常一样，我想可能有一种我不知道的更有效的方法。

这是我的代码的快速框架 - 没有遗漏重要部分。

import socket
from scapy.all import *


edges_file = "edges.csv"
pcap_file = "tcpdump.pcap"

try:
    print '[+] Reading and parsing pcap file: %s' % pcap_file
    a = rdpcap(pcap_file)

except Exception as e:
    print 'Something went wrong while opening/reading the pcap file.' \
          '\n\nThe error message is: %s' % e
    exit(0)

sessions = a.sessions()

print '[+] Writing to edges.csv'
f1 = open(edges_file, 'w')
f1.write('source,target,protocol,sourceport,destinationport,'
         'num_of_packets\n')
for k, v in sessions.iteritems():

    tot_packets = len(v)

    if "UDP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "TCP" in k:
        proto, source, flurp, target = k.split()
        srcip, srcport = source.split(":")
        dstip, dstport = target.split(":")
        f1.write('%s,%s,%s,%s,%s,%s\n' % (srcip, dstip, proto, srcport,
                                          dstport, tot_packets))
        continue

    elif "ICMP" in k:
        continue  # Not bothered about ICMP right now

    else:
        continue  # Or any other 'weird' pacakges for that matter ;)

print '[+] Closing the edges file'
f1.close()

一如既往 - 感谢任何帮助。

最佳答案

我知道我参加聚会迟到了，但希望这对 future 的访客有用。

rdpcap() 剖析整个 pcap 文件并 retains an in-memory representation of each and every packet ，这解释了为什么它会占用大量内存。

据我所知(我自己也是一个 Scapy 新手)，调用 Scapy session 重组的唯一两种方法是:

调用scapy.plist.PacketList.sessions() 。这就是您当前正在执行的操作(rdpcap(pcap_file) 返回 scapy.plist.PacketList)。
通过在离线模式下使用 sniff() 读取 pcap while also providing the function with a session decoder implementation 。例如，对于 TCP 重组，您需要执行 sniff(offline='stackoverflow.pcap', session=TCPSession)。 (这是在 Scapy 2.4.3 中添加的)。

选项 1 显然是一个死胡同(因为它要求我们同时将所有 session 的所有数据包保留在内存中)，所以让我们探索选项 2...

让我们以交互模式启动 Scapy 来访问 sniff() 的文档:

$ scapy
>>> help(sniff)

Help on function sniff in module scapy.sendrecv:

sniff(*args, **kwargs)
    Sniff packets and return a list of packets.
    
    Args:
        count: number of packets to capture. 0 means infinity.
        store: whether to store sniffed packets or discard them
        prn: function to apply to each packet. If something is returned, it
             is displayed.
             --Ex: prn = lambda x: x.summary()
        session: a session = a flow decoder used to handle stream of packets.
                 e.g: IPSession (to defragment on-the-flow) or NetflowSession
        filter: BPF filter to apply.
        lfilter: Python function applied to each packet to determine if
                 further action may be done.
                 --Ex: lfilter = lambda x: x.haslayer(Padding)
        offline: PCAP file (or list of PCAP files) to read packets from,
                 instead of sniffing them
        timeout: stop sniffing after a given time (default: None).
        L2socket: use the provided L2socket (default: use conf.L2listen).
        opened_socket: provide an object (or a list of objects) ready to use
                      .recv() on.
        stop_filter: Python function applied to each packet to determine if
                     we have to stop the capture after this packet.
                     --Ex: stop_filter = lambda x: x.haslayer(TCP)
        iface: interface or list of interfaces (default: None for sniffing
               on all interfaces).
        monitor: use monitor mode. May not be available on all OS
        started_callback: called as soon as the sniffer starts sniffing
                          (default: None).
    
    The iface, offline and opened_socket parameters can be either an
    element, a list of elements, or a dict object mapping an element to a
    label (see examples below).

注意store参数。我们可以将其设置为False以使sniff()以流式方式操作(读取单个数据包，处理它，然后从内存中释放它):

sniff(offline='stackoverflow.pcap', session=TCPSession, store=False)

我刚刚使用 193 MB pcap 对此进行了测试。对于 store=True(默认值)，这会占用我的系统 (macOS) 上约 1.7 GB 的内存，但当 store=False 时，仅占用约 47 MB。

处理重组的 TCP session (开放问题)

因此，我们设法减少了内存占用 - 太棒了!但是我们如何处理(据称)重组的 TCP session 呢？ The usage instructions表示我们应该使用 sniff() 的 prn 参数来指定一个回调函数，然后通过重新组装的 TCP session 调用该函数(重点是我的):

sniff() also provides Sessions, that allows to dissect a flow of packets seamlessly. For instance, you may want your sniff(prn=...) function to automatically defragment IP packets, before executing the prn.

该示例是在 IP 分段的上下文中，但我希望 TCP 模拟能够将 session 的所有数据包分组，然后为每个 session 调用 prn 一次。不幸的是，它不是这样工作的:我在我的示例 pcap 上尝试了这一点，并且每个数据包都会调用一次回调 - 完全如上面所示的 sniff() 文档中所示。

上面链接的使用说明还说明了有关在 sniff() 中使用 session=TCPSession 的信息:

TCPSession -> defragment certain TCP protocols*. Only HTTP 1.0 currently uses this functionality.

考虑到上面实验的输出，我现在将其解释为每当 Scapy 发现跨越多个 TCP 段的 HTTP (1.0) 请求/响应时，它就会创建一个数据包，其中的有效负载是这些 TCP 段的合并有效负载(总共是完整的 HTTP 请求/响应)。如果有人能帮助澄清上面引用的 TCPSession 的含义，我将不胜感激——或者更好的是:澄清 TCP 重组是否确实可以通过这种方式进行，而我只是误解了 API。

关于python - 使用 Scapy 从 pcap 文件读取 session 可以提高内存效率吗，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35310104/

python - 使用 Scapy 从 pcap 文件读取 session 可以提高内存效率吗

上一篇：SonarQube 后台任务因具有相同 key 的多个条目而失败

下一篇：.net - 如何从 F# dll 重建 F# 程序集签名