c# - Linq-to-XML XElement.Remove() 留下不需要的空白

我有一个从字节数组创建的 XDocument(通过 tcp/ip 接收)。

然后我搜索特定的 xml 节点 (XElements)，并在检索到值后通过调用 XElement.Remove() 将其从 Xdocument 中“弹出”。完成所有解析后，我希望能够记录未解析的 xml(XDocument 中剩余的 xml)。问题是调用 XElement.Remove() 时会保留额外的空格。我想知道在保留剩余 xml 中的其余格式的同时删除此额外空格的最佳方法。

示例/示例代码

如果我通过套接字收到以下 xml:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>

然后我使用以下代码解析此 xml 并删除一些 XElements:

private void socket_messageReceived(object sender, MessageReceivedEventArgs e)
{
     XDocument xDoc;
     try
     {
         using (MemoryStream xmlStream = new MemoryStream(e.XmlAsBytes))
         using (XmlTextReader reader = new XmlTextReader(xmlStream))
         {
             xDoc = XDocument.Load(reader);
         }

         XElement Author = xDoc.Root.Descendants("author").FirstOrDefault();
         XElement Title  = xDoc.Root.Descendants("title").FirstOrDefault();
         XElement Genre  = xDoc.Root.Descendants("genre").FirstOrDefault();

         // Do something with Author, Title, and Genre here...

         if (Author != null) Author.Remove();
         if (Title  != null) Title.Remove();
         if (Genre  != null) Genre.Remove();

         LogUnparsedXML(xDoc.ToString());

     }
     catch (Exception ex)
     {
         // Exception Handling here...
     }
}

然后发送到 LogUnparsedXML 消息的 xml 的结果字符串将是:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">



      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
</catalog>

在这个人为设计的示例中，它可能看起来没什么大不了的，但在我的实际应用程序中，剩余的 xml 看起来很草率。我尝试使用采用 SaveOptions 枚举的 XDocument.ToString 重载无济于事。我还尝试调用 xDoc.Save 以使用 SaveOptions 枚举保存到文件。我确实尝试过一些使用 XElement.Nodes().OfType<XText>() 的不同 linq 查询。尝试删除空格，但通常我最终会同时使用我希望保留的空格和我试图摆脱的空格。

在此先感谢您的帮助。

乔

最佳答案

以可移植的方式回答并不容易，因为解决方案在很大程度上取决于 XDocument.Load() 的方式。生成空白文本节点(LINQ to XML 的几种实现可能不同意这个微妙的细节)。

也就是说，您似乎永远不会从 <description> 中删除最后一个 子级 ( <book>)元素。如果确实如此，那么我们就不必担心父元素的结束标记的缩进，我们可以只删除该元素及其后面的所有文本节点，直到我们到达另一个元素。 TakeWhile()将完成这项工作。

编辑:好吧，看来你毕竟需要删除最后一个 child 。因此，事情会变得更加复杂。下面的代码实现了以下算法:

If the element is not the last element of its parent:

Remove all following text nodes until we reach the next element.

Otherwise:

Remove all following text nodes until we find one containing a newline,

If that node only contains a newline:

Remove that node.

Otherwise:

Create a new node containing only the whitespace found after the newline,

Insert that node after the original node,

Remove the original node.

Remove the element itself.

结果代码是:

public static void RemoveWithNextWhitespace(this XElement element)
{
    IEnumerable<XText> textNodes
        = element.NodesAfterSelf()
                 .TakeWhile(node => node is XText).Cast<XText>();
    if (element.ElementsAfterSelf().Any()) {
        // Easy case, remove following text nodes.
        textNodes.ToList().ForEach(node => node.Remove());
    } else {
        // Remove trailing whitespace.
        textNodes.TakeWhile(text => !text.Value.Contains("\n"))
                 .ToList().ForEach(text => text.Remove());
        // Fetch text node containing newline, if any.
        XText newLineTextNode
            = element.NodesAfterSelf().OfType<XText>().FirstOrDefault();
        if (newLineTextNode != null) {
            string value = newLineTextNode.Value;
            if (value.Length > 1) {
                // Composite text node, trim until newline (inclusive).
                newLineTextNode.AddAfterSelf(
                    new XText(value.SubString(value.IndexOf('\n') + 1)));
            }
            // Remove original node.
            newLineTextNode.Remove();
        }
    }
    element.Remove();
}

从那里，您可以:

if (Author != null) Author.RemoveWithNextWhitespace();
if (Title  != null) Title.RemoveWithNextWhitespace();
if (Genre  != null) Genre.RemoveWithNextWhitespace();

虽然我建议您将上面的内容替换为从数组或 params 馈送的循环之类的东西方法调用，避免代码冗余。

关于c# - Linq-to-XML XElement.Remove() 留下不需要的空白，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/6851247/

c# - Linq-to-XML XElement.Remove() 留下不需要的空白

上一篇：c# - Microsoft.Jet.OLEDB.4.0 的替代方案，用于对 MDB 文件进行 64 位 Access

下一篇：c# - 我怎样才能确保只有一个线程会做某事？