Python:将多个 YAML 文档转换为 JSON

标签 python json python-2.7 yaml

我目前正在尝试使用 python 将一些 YAML 转换为 JSON,但很难正确设置 JSON 格式。我的 YAML 文件有多个如下所示的文档:

title: Windows Shell Spawning Suspicious Program
status: experimental
description: Detects a suspicious child process of a Windows shell
references:
    - https://mgreen27.github.io/posts/2018/04/02/DownloadCradle.html
author: Florian Roth
date: 20018/04/06
logsource:
    product: windows
    service: sysmon
detection:
    selection:
        EventID: 1
        ParentImage:
            - '*\mshta.exe'
            - '*\powershell.exe'
            - '*\cmd.exe'
            - '*\rundll32.exe'
            - '*\cscript.exe'
            - '*\wscript.exe'
            - '*\wmiprvse.exe'
        Image:
            - '*\schtasks.exe'
            - '*\nslookup.exe'
            - '*\certutil.exe'
            - '*\bitsadmin.exe'
            - '*\mshta.exe'
    condition: selection
fields:
    - CommandLine
    - ParentCommandLine
falsepositives:
    - Administrative scripts
level: medium
...

我想要做的是对于每个文档,提取检测、字段、误报和级别,并将它们作为单独的数组放入 JSON 文档中。我的第一次尝试非常糟糕,只是将每个文档中的组集中到列表中:

data = {}
data['indicator'] = {}
data['indicator']['detection']=[]
data['indicator']['fields']=[]
data['indicator']['false positives']=[]
data['indicator']['level']=[]
with open(yaml_file, 'r') as yaml_in, open(json_file, 'a') as definition:
     loadyaml = yaml.safe_load_all(yaml_in)
     for item in loadyaml:
         for header, subsections in item.iteritems():
             if header == 'detection':
                 data['indicator']['detection'].append(subsections)
             elif header == 'fields':
                 data['indicator']['fields'].append(subsections)
             elif header == 'false positives':
                 data['indicator']['false positives'].append(subsections)
             elif header == 'level':
                 data['indicator']['level'].append(subsections)

     json.dump(data, definition, indent=4)

我希望将我的每个文档作为单独的指标输入到我的 json 文档中,并将它们的检测、字段、dalspositives 和级别全部分组在一起 - 但我的 python 能力让我失望。

如果我能对此有任何见解,我将不胜感激!

最佳答案

您可以通过迭代 .load_all() 和一个更小的程序来获得所需的输出:

import sys
import ruamel.yaml
import json

yaml = ruamel.yaml.YAML(typ='safe')
ind = dict()
data = dict(indicator=ind)
for d in yaml.load_all(open('input.yaml')):
    for k in ('detection', 'fields', 'falsepositives', 'level'):
        ind.setdefault(k, []).append(d[k])

json.dump(data, sys.stdout, indent=2)

如果您有文件input.yaml:

---
title: Windows Shell Spawning Suspicious Program
status: experimental
description: Detects a suspicious child process of a Windows shell
references:
    - https://mgreen27.github.io/posts/2018/04/02/DownloadCradle.html
author: Florian Roth
date: 20018/04/06
logsource:
    product: windows
    service: sysmon
detection:
    selection:
        EventID: 1
        ParentImage:
            - '*\mshta.exe'
            - '*\powershell.exe'
            - '*\cmd.exe'
            - '*\rundll32.exe'
            - '*\cscript.exe'
            - '*\wscript.exe'
            - '*\wmiprvse.exe'
        Image:
            - '*\schtasks.exe'
            - '*\nslookup.exe'
            - '*\certutil.exe'
            - '*\bitsadmin.exe'
            - '*\mshta.exe'
    condition: selection
fields:
    - CommandLine
    - ParentCommandLine
falsepositives:
    - Administrative scripts
level: medium
...
---
title: Bash starting just what is asked
status: stabel
description: No negative side effects
references:
    - https://nblue24.github.io/posts/2019/04/01/DownloadBed.html
author: Axel Roth
date: 2019/04/01
logsource:
    product: linux
    service: good
detection:
    selection:
        EventID: 42
        ParentImage:
            - '*/bash'
            - '*/ash'
        Image:
            - systemctl
            - init
    condition: selection
fields:
    - Shell
    - ParentShell
falsepositives:
    - root programs
level: high
...

您的输出将是:

{
  "indicator": {
    "detection": [
      {
        "selection": {
          "EventID": 1,
          "ParentImage": [
            "*\\mshta.exe",
            "*\\powershell.exe",
            "*\\cmd.exe",
            "*\\rundll32.exe",
            "*\\cscript.exe",
            "*\\wscript.exe",
            "*\\wmiprvse.exe"
          ],
          "Image": [
            "*\\schtasks.exe",
            "*\\nslookup.exe",
            "*\\certutil.exe",
            "*\\bitsadmin.exe",
            "*\\mshta.exe"
          ]
        },
        "condition": "selection"
      },
      {
        "selection": {
          "EventID": 42,
          "ParentImage": [
            "*/bash",
            "*/ash"
          ],
          "Image": [
            "systemctl",
            "init"
          ]
        },
        "condition": "selection"
      }
    ],
    "fields": [
      [
        "CommandLine",
        "ParentCommandLine"
      ],
      [
        "Shell",
        "ParentShell"
      ]
    ],
    "falsepositives": [
      [
        "Administrative scripts"
      ],
      [
        "root programs"
      ]
    ],
    "level": [
      "medium",
      "high"
    ]
  }
}

这适用于 Python 2 和 3。

关于Python:将多个 YAML 文档转换为 JSON,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51291788/

相关文章:

python - 分配算法帮助,使用Python

Python Pandas - 数据透视表输出意外 float

java - 有没有办法使用 jersey 配置部分或完全序列化为 json?

python - 在 Flask 中寻找 url_for 的倒数

Python 服务/守护进程

python - 无法对列数据重新排序

c++ - python 字符串使用特殊字符发送到 c++ dll,崩溃

android - 无法解析 JSON 响应

python-2.7 - TypeError : 'PathCollection' object is not iterable when adding second legend to plot

Python 切片显示相同的 id 位置