json - Logstash:从数组到字符串的 XML 到 JSON 输出

标签 json xml elasticsearch logstash

我正在尝试使用 Logstash 将 XML 转换为 JSON 以用于 ElasticSearch。我能够获取读取的值并将其发送到 ElasticSearch。问题是所有值都以数组形式出现。我想让它们只是字符串。我知道我可以为每个字段单独执行 replace,但随后我遇到了嵌套字段深度为 3 层的问题。

XML

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<acs2:SubmitTestResult xmlns:acs2="http://tempuri.org/" xmlns:acs="http://schemas.sompleace.org" xmlns:acs1="http://schemas.someplace.org">
    <acs2:locationId>Location Id</acs2:locationId>
    <acs2:userId>User Id</acs2:userId>
    <acs2:TestResult>
        <acs1:CreatedBy>My Name</acs1:CreatedBy>
        <acs1:CreatedDate>2015-08-07</acs1:CreatedDate>
        <acs1:Output>10.5</acs1:Output>
    </acs2:TestResult>
</acs2:SubmitTestResult>

Logstash 配置

input {
    file {
        path => "/var/log/logstash/test.xml"
    }
}
filter {
    multiline {
        pattern => "^\s\s(\s\s|\<\/acs2:SubmitTestResult\>)"
        what => "previous"
    }
    if "multiline" in [tags] {
        mutate {
            replace => ["message", '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>%{message}']
        }
        xml {
            target => "SubmitTestResult"
            source => "message"
        }
        mutate {
            remove_field => ["message", "@version", "host", "@timestamp", "path", "tags", "type"]
            remove_field => ["entry", "[SubmitTestResult][xmlns:acs2]", "[SubmitTestResult][xmlns:acs]", "[SubmitTestResult][xmlns:acs1]"]

            # This works
            replace => [ "[SubmitTestResult][locationId]", "%{[SubmitTestResult][locationId]}" ]

            # This does NOT work
            replace => [ "[SubmitTestResult][TestResult][CreatedBy]", "%{[SubmitTestResult][TestResult][CreatedBy]}" ]
        }
    }
}
output {
    stdout {
        codec => "rubydebug"
    }
    elasticsearch {
        index => "xmltest"
        cluster => "logstash"
    }
}

示例输出

{
   "_index": "xmltest",
   "_type": "logs",
   "_id": "AU8IZBURkkRvuur_3YDA",
   "_version": 1,
   "found": true,
   "_source": {
      "SubmitTestResult": {
         "locationId": "Location Id",
         "userId": [
            "User Id"
         ],
         "TestResult": [
            {
               "CreatedBy": [
                  "My Name"
               ],
               "CreatedDate": [
                  "2015-08-07"
               ],
               "Output": [
                  "10.5"
               ]
            }
         ]
      }
    }
}

如您所见,输出是每个元素的数组(我替换为的 locationId 除外)。我试图不必为每个元素进行替换。有没有办法调整配置以使输出正确放置?如果不是,我如何在 replace 中深入 3 个级别?

--更新--

我想出了如何在测试结果中达到第 3 级。替换为:

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]

最佳答案

我想通了。这是解决方案。

replace => [ "[SubmitTestResult][TestResult][0][CreatedBy]", "%{[SubmitTestResult][TestResult][0][CreatedBy]}" ]

关于json - Logstash:从数组到字符串的 XML 到 JSON 输出,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31880172/

相关文章:

json - REST API 返回 JSON/XML 以外的内容是否可以接受?

java - XMLUnit:删除父包装器

php - 从 xml 变量中过滤掉撇号

elasticsearch - 如何使用 ElasticSearch 实现社交搜索?

elasticsearch - 通过Spark SQL将Tableau与Elastic Search连接

java - 嵌套 Json 以使用 Jackson 映射

javascript - 从 HTML 属性中提取 JSON 对象

ElasticSearch 2.0 Java API : java. lang.ClassNotFoundException : org. elasticsearch.common.settings.ImmutableSettings$Builder

php - 必填字段缺少 json/android

xml - "Non-zero exit status"下载 XML 和 RCurl R 包时出错