xml - 删除 XML 节点以将 XML 日志文件的大小减小到给定大小

我在从 xml 文件中删除节点时遇到一些困难。我发现很多其他人通过各种方式在 powershell 中执行此操作的示例，下面的代码似乎与我见过的许多其他示例相同，但我没有得到所需的行为。

我的目标是将输出 XML 的大小减小到 4KB 以下。

下面的代码没有出错，但是 $updateactivity 中对象的数量永远不会改变，所以节点似乎没有被删除。

这是一个 xml 格式的日志，所以我先删除最旧的条目。

示例 xml:

<?xml version="1.0" encoding="utf-16"?>
<LogEntries version="1.0" appname="Dell Command | Update" appversion="4.3.0">
    <LogEntry>
        <serviceVersion>2.3.0.36</serviceVersion>
        <appname>DellCommandUpdate</appname>
        <level>Normal</level>
        <timestamp>2022-01-07T13:29:57.9364469-08:00</timestamp>
        <source>UpdateScheduler.UpdateScheduler.Start</source>
        <message>Starting the update scheduler.</message>
        <trace/>
        <data/>
    </LogEntry>
</LogEntries>

代码:

    [xml]$dcuxml = get-content "C:\ProgramData\dell\UpdateService\Log\Activity.log"
    $xmllog = $dcuxml.LogEntries
    $update_activity = $xmllog.LogEntry | NotableDCU
    $i = 0
    Do{
        foreach($entry in $update_activity){
            $entry.parentnode.RemoveChild($entry)
            $xmlsize = [System.Text.Encoding]::UTF8.GetByteCount(($update_activity.InnerXml | Out-String)) / 1KB
        }
    }while($xmlsize -gt 3.99)

最佳答案

这是一种替代解决方案，它使用基于 XmlReader 的流方法和 XmlWriter只要。与我的 first solution 相比，它不限制输入文件的大小，具体取决于可用 RAM 的数量。

虽然我的第一个解决方案将整个输入文件读取到内存中的 XmlDocument 中，但此解决方案仅根据输出文件的需要在内存中保留尽可能多的日志条目。

此外，它比第一个解决方案快，因为它不会产生创建 DOM 的开销(一个包含 100k 条目的 63 MB 日志文件花费了大约 1.5 秒 使用当前解决方案进行处理，而使用我的第一个解决方案花费了超过 6 分钟(!))。

缺点是代码比我的第一个解决方案更长。

$inputPath      = "$PWD\log.xml"
$outputPath     = "$PWD\log_new.xml"

# Maximum size of the output file (which can be slightly larger as we only 
# count the size of the log entries).
$maxByteCount   = 4KB

$writerSettings = [Xml.XmlWriterSettings] @{
    Encoding = [Text.Encoding]::Unicode   # UTF-16 as in input document
    # Replace with this line to encode in UTF-8 instead 
    # Encoding = [Text.Encoding]::UTF8
    Indent = $true
    IndentChars = ' ' * 4   # should match indentation of input document
    ConformanceLevel = [Xml.ConformanceLevel]::Auto
}

$entrySeparator = "`n" + $writerSettings.IndentChars

$totalByteCount = 0
$queue = [Collections.Generic.Queue[object]]::new()

$reader = $writer = $null

try {
    # Open the input file.
    $reader = [Xml.XmlReader]::Create( $inputPath )

    # Create or overwrite the output file.
    $writer = [Xml.XmlWriter]::Create( $outputPath, $writerSettings ) 
    $writer.WriteStartDocument()  # write the XML declaration

    # Copy the document root element and its attributes without recursing into child elements.
    $null = $reader.MoveToContent()
    $writer.WriteStartElement( $reader.Name )
    $writer.WriteAttributes( $reader, $false )

    # Loop over the nodes of the input file.
    while( $reader.Read() ) {
        # Skip everything that is not an XML element
        if( $reader.NodeType -ne [xml.XmlNodeType]::Element ) {
            continue
        }

        # Read the XML of the current element and its children.
        $xmlStr = $reader.ReadOuterXml()
        # Calculate how much bytes the current element takes when written to file.
        $byteCount = $writerSettings.Encoding.GetByteCount( $xmlStr + $entrySeparator )

        # Append XML string and byte count to the end of the queue.
        $queue.Enqueue( [PSCustomObject]@{
            xmlStr = $xmlStr
            byteCount = $byteCount
        })
        $totalByteCount += $byteCount

        # Remove entries from beginning of queue to ensure maximum size is not exceeded.
        while( $totalByteCount -ge $maxByteCount ) {
            $totalByteCount -= $queue.Dequeue().byteCount
        }
    }

    # Write the last log entries, which are below maximum size, to the output file.
    foreach( $entry in $queue ) {
        $writer.WriteString( $entrySeparator )
        $writer.WriteRaw( $entry.xmlStr )
    }

    # Finish the document.
    $writer.WriteString("`n")   
    $writer.WriteEndElement()
    $writer.WriteEndDocument()    
}
finally {
    # Close the input and output files
    if( $writer ) { $writer.Dispose() }
    if( $reader ) { $reader.Dispose() }
}

算法基本上是这样工作的:

创建 queue存储 XML 的自定义对象的数量和每个日志条目的字节大小。
对于输入文件的每个日志条目:
- 读取日志条目的 XML 并计算日志条目的字节大小(如在磁盘上，应用输出编码)。将此数据添加到队列末尾。
- 如有必要，从队列开头删除日志条目，以确保不超过所需的最大字节数。
将队列中的日志条目写入输出文件。
为简单起见，我们只考虑日志条目的大小，因此由于 XML 声明和文档根元素，实际输出文件可能会稍大一些。

关于xml - 删除 XML 节点以将 XML 日志文件的大小减小到给定大小，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73421470/

xml - 删除 XML 节点以将 XML 日志文件的大小减小到给定大小

上一篇：python - 让 Django 管理对象页面显示更少？

下一篇：python - 创建模糊重复键以使用模糊匹配对行求和 (Pandas)