python - 如何删除 XML 文件的一部分?

标签 python xml

我需要删除 XML 文件的某些部分,例如这个文件:

<dict>
    <key>Images</key>
    <array>
        <dict>
            <key>ImageIndex</key>
            <integer>0</integer>
            <key>NumberOfROIs</key>
            <integer>42</integer>
            <key>ROIs</key>
            <array>
                <dict>
                    <key>Area</key>
                    <real>0.0</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>0.0</real>
                    <key>IndexInImage</key>
                    <integer>0</integer>
                    <key>Max</key>
                    <real>1358</real>
                    <key>Mean</key>
                    <real>1358</real>
                    <key>Min</key>
                    <real>1358</real>
                    <key>Name</key>
                    <string>Calcification</string>
                    <key>NumberOfPoints</key>
                    <integer>1</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(2964.620117, 3427.979980)</string>
                    </array>
                    <key>Total</key>
                    <real>1358</real>
                    <key>Type</key>
                    <integer>19</integer>
                </dict>
                <dict>
                    <key>Area</key>
                    <real>0.0</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>0.0</real>
                    <key>IndexInImage</key>
                    <integer>1</integer>
                    <key>Max</key>
                    <real>1401</real>
                    <key>Mean</key>
                    <real>1401</real>
                    <key>Min</key>
                    <real>1401</real>
                    <key>Name</key>
                    <string>Calcification</string>
                    <key>NumberOfPoints</key>
                    <integer>1</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(2993.159912, 3403.550049)</string>
                    </array>
                    <key>Total</key>
                    <real>1401</real>
                    <key>Type</key>
                    <integer>19</integer>
                </dict>
                <dict>
                    <key>Area</key>
                    <real>1.3665732145309448</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>66.487342834472656</real>
                    <key>IndexInImage</key>
                    <integer>36</integer>
                    <key>Max</key>
                    <real>1836</real>
                    <key>Mean</key>
                    <real>1583.29638671875</real>
                    <key>Min</key>
                    <real>1313</real>
                    <key>Name</key>
                    <string>Mass</string>
                    <key>NumberOfPoints</key>
                    <integer>89</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(3196.290039, 1048.599976)</string>
                        <string>(3203.560059, 1046.170044)</string>
                        <string>(3211.330078, 1042.780029)</string>
                        <string>(3189.500000, 1050.540039)</string>
                    </array>
                    <key>Total</key>
                    <real>44457380</real>
                    <key>Type</key>
                    <integer>15</integer>
                </dict>
            </array>
        </dict>
    </array>
</dict>
</plist>  

我想删除之间的所有内容,包括,其中有钙化,换句话说,我只想要没有钙化的部分,我想要的该文件的结果将是:

<dict>
    <key>Images</key>
    <array>
        <dict>
            <key>ImageIndex</key>
            <integer>0</integer>
            <key>NumberOfROIs</key>
            <integer>42</integer>
            <key>ROIs</key>
            <array>
                <dict>
                    <key>Area</key>
                    <real>1.3665732145309448</real>
                    <key>Center</key>
                    <string>(0.000000, 0.000000, 0.000000)</string>
                    <key>Dev</key>
                    <real>66.487342834472656</real>
                    <key>IndexInImage</key>
                    <integer>36</integer>
                    <key>Max</key>
                    <real>1836</real>
                    <key>Mean</key>
                    <real>1583.29638671875</real>
                    <key>Min</key>
                    <real>1313</real>
                    <key>Name</key>
                    <string>Mass</string>
                    <key>NumberOfPoints</key>
                    <integer>89</integer>
                    <key>Point_mm</key>
                    <array>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                        <string>(0.000000, 0.000000, 0.000000)</string>
                    </array>
                    <key>Point_px</key>
                    <array>
                        <string>(3196.290039, 1048.599976)</string>
                        <string>(3203.560059, 1046.170044)</string>
                        <string>(3211.330078, 1042.780029)</string>
                        <string>(3189.500000, 1050.540039)</string>
                    </array>
                    <key>Total</key>
                    <real>44457380</real>
                    <key>Type</key>
                    <integer>15</integer>
                </dict>
            </array>
        </dict>
    </array>
</dict>
</plist> 

这是我试过的:

data = r"C:\Users\vinc\Desktop\ExemploXML.xml"    
    
import xml.etree.ElementTree as ET
tree = ET.parse(data)
root = tree.getroot()
for e in root.findall(".//string"):
    if e.text == 'Calcification':
        
        print(e)
        root.remove(e)
    else:
        pass
tree.write(r'C:\Users\vinc\Desktop\out.xml')

结果 ======================================

<Element 'string' at 0x000002B085002EA0>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-43-d417d00038ed> in <module>
      8 
      9         print(e)
---> 10         root.remove(e)
     11     else:
     12         pass

ValueError: list.remove(x): x not in list

对于上下文,那些XML文件是语义分割信息,我想去除钙化类注释。

最佳答案

这是基于 XSLT 的解决方案。

下面的 XSLT 遵循所谓的Identity Transform 模式。

单行模板删除不需要<dict>元素:

<xsl:template match="dict[string='Calcification']"/>

How to transform an XML file using XSLT in Python?

XSLT

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="dict[string='Calcification']"/>
</xsl:stylesheet>

关于python - 如何删除 XML 文件的一部分?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70442605/

相关文章:

Python-向 sqlite 数据库添加动态列数

python - 如何使用 Keras Python 3 查找总损失、准确性、预测日期时间?

java - 使用结构化元素名称解析 XML

Python lxml 通过 id-tag 查找元素

Android facebook 登录小部件 : org. json.JSONException

python - 使用python程序提取后无法从gmail中读取电子邮件内容

python - nltk Sentiwordnet 与 python 的结合使用

xml - XSL : How to select substrings between multiple repeating ~ characters as separator

python - 将数组切成段

javascript - 在 JavaScript 和 xml 中计算天数