python - 将 xml 文档转换为特定的点扩展 json 结构

标签 python xml recursion elementtree

我有以下 XML 文档:

<Item ID="288917">
  <Main>
    <Platform>iTunes</Platform>
    <PlatformID>353736518</PlatformID>
  </Main>
  <Genres>
    <Genre FacebookID="6003161475030">Comedy</Genre>
    <Genre FacebookID="6003172932634">TV-Show</Genre>
  </Genres>
  <Products>
    <Product Country="CA">
      <URL>https://itunes.apple.com/ca/tv-season/id353187108?i=353736518</URL>
      <Offers>
        <Offer Type="HDBUY">
          <Price>3.49</Price>
          <Currency>CAD</Currency>
        </Offer>
        <Offer Type="SDBUY">
          <Price>2.49</Price>
          <Currency>CAD</Currency>
        </Offer>
      </Offers>
    </Product>
    <Product Country="FR">
      <URL>https://itunes.apple.com/fr/tv-season/id353187108?i=353736518</URL>
      <Rating>Tout public</Rating>
      <Offers>
        <Offer Type="HDBUY">
          <Price>2.49</Price>
          <Currency>EUR</Currency>
        </Offer>
        <Offer Type="SDBUY">
          <Price>1.99</Price>
          <Currency>EUR</Currency>
        </Offer>
      </Offers>
    </Product>
  </Products>
</Item>

目前,为了将其转换为 json 格式,我正在执行以下操作:

parser = etree.XMLParser(recover=True)
node = etree.fromstring(s, parser=parser)
data = xmltodict.parse(etree.tostring(node))

当然 xmltodict 正在做繁重的工作。但是,它为我提供了一种不适合我要完成的工作的格式。这是我希望最终数据看起来像的样子:

{
    "Item[@ID]": 288917, # if no preceding element, use the root node tag
    "Main.Platform": "iTunes",
    "Main.PlatformID": "353736518",
    "Genres.Genre": ["Comedy", "TV-Show"] # list of elements if repeated
    "Genres.Genre[@FacebookID]": ["6003161475030", "6003161475030"],
    "Products.Product[@Country]": ["CA", "FR"],
    "Products.Product.URL": ["https://itunes.apple.com/ca/tv-season/id353187108?i=353736518", "https://itunes.apple.com/fr/tv-season/id353187108?i=353736518"],
    "Products.Product.Offers.Offer[@Type]": ["HDBUY", "SDBUY", "HDBUY", "SDBUY"],
    "Products.Product.Offers.Offer.Price": ["3.49", "2.49", "2.49", "1.99"],
    "Products.Product.Offers.Offer.Currency": "EUR"    
}

最佳答案

这有点冗长,但将其格式化为平面字典并不难。这是一个例子:

node = etree.fromstring(file_data.encode('utf-8'), parser=parser)
data = OrderedDict()
nodes = [(node, ''),] # format is (node, prefix)

while nodes:

    for sub, prefix in nodes:

        # remove the prefix tag unless its for the first attribute
        tag_prefix = '.'.join(prefix.split('.')[1:]) if ('.' in prefix) else ''
        atr_prefix = sub.tag if (sub == node) else tag_prefix

        # tag
        if sub.text.strip():
            _prefix = tag_prefix + '.' + sub.tag
            _value = sub.text.strip()
            if data.get(_prefix): # convert it to a list if multiple values
                if not isinstance(data[_prefix], list): data[_prefix] = [data[_prefix],]
                data[_prefix].append(_value)
            else:
                data[_prefix] = _value

        # atr
        for k, v in sub.attrib.items():
            _prefix = atr_prefix + '[@%s]' % k
            _value = v
            if data.get(_prefix): # convert it to a list if multiple values
                if not isinstance(data[_prefix], list): data[_prefix] = [data[_prefix],]
                data[_prefix].append(_value)
            else:
                data[_prefix] = _value

        nodes.remove((sub, prefix))

        for s in sub.getchildren():
            _prefix = (prefix + '.' + sub.tag).strip('.')
            nodes.append((s, _prefix))

    if not nodes: break

关于python - 将 xml 文档转换为特定的点扩展 json 结构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53983293/

相关文章:

java - 每个 xml 标记上的 JAXB 命名空间

c# - yield 递延迭代问题

c# - Java中递归和非递归函数的效率比较

python - 在 distutils/setuptools 之前和之后访问数据文件

python - 使用 sklearn columntransfromer 时解压错误

xml - 从 azure 逻辑应用操作内的 Envolope 请求中提取 SOAP 正文

xml - {name}在属性中没有被替换,如何处理?

c++ - 数字在数组中出现的次数

python - 'MultipleObjectsReturne'异常的import语句中使用了哪个文件

python - 将句子的字符串表示形式列表转换为词汇集