c# - 克隆 Office Open XML 文档的最有效方法是什么?

标签 c# .net .net-core openxml openxml-sdk

在处理 Office Open XML 文档(例如,自 Office 2007 发布以来由 Word、Excel 或 PowerPoint 创建的文档)时,您通常希望克隆或复制现有文档,然后对该副本进行更改,从而创建一个新的文档。

在这种情况下,已经提出和回答了几个问题(有时是错误的或至少不是最佳的),这表明用户确实面临着问题。例如:

  • Duplicating Word document using OpenXml and C#
  • Word OpenXml Word Found Unreadable Content
  • Open XML SDK: opening a Word template and saving to a different file-name
  • docx document corrupted when copied though OpenXML C#

  • 所以,问题是:
  • 正确克隆或复制这些文档的可能方法是什么?
  • 哪种方式最有效?
  • 最佳答案

    以下示例类展示了多种方法来正确复制几乎任何文件并将副本返回到 MemoryStreamFileStream 上,然后您可以从中打开 WordprocessingDocument (Word)、SpreadsheetDocument (Excel) 或 PresentationDocument (PowerPoint) 并进行任何更改,使用 Open XML SDK 和可选的 Open-XML-PowerTools

    using System.IO;
    
    namespace CodeSnippets.IO
    {
        /// <summary>
        /// This class demonstrates multiple ways to clone files stored in the file system.
        /// In all cases, the source file is stored in the file system. Where the return type
        /// is a <see cref="MemoryStream"/>, the destination file will be stored only on that
        /// <see cref="MemoryStream"/>. Where the return type is a <see cref="FileStream"/>,
        /// the destination file will be stored in the file system and opened on that
        /// <see cref="FileStream"/>.
        /// </summary>
        /// <remarks>
        /// The contents of the <see cref="MemoryStream"/> instances returned by the sample
        /// methods can be written to a file as follows:
        ///
        ///     var stream = ReadAllBytesToMemoryStream(sourcePath);
        ///     File.WriteAllBytes(destPath, stream.GetBuffer());
        ///
        /// You can use <see cref="MemoryStream.GetBuffer"/> in cases where the MemoryStream
        /// was created using <see cref="MemoryStream()"/> or <see cref="MemoryStream(int)"/>.
        /// In other cases, you can use the <see cref="MemoryStream.ToArray"/> method, which
        /// copies the internal buffer to a new byte array. Thus, GetBuffer() should be a tad
        /// faster.
        /// </remarks>
        public static class FileCloner
        {
            public static MemoryStream ReadAllBytesToMemoryStream(string path)
            {
                byte[] buffer = File.ReadAllBytes(path);
                var destStream = new MemoryStream(buffer.Length);
                destStream.Write(buffer, 0, buffer.Length);
                destStream.Seek(0, SeekOrigin.Begin);
                return destStream;
            }
    
            public static MemoryStream CopyFileStreamToMemoryStream(string path)
            {
                using FileStream sourceStream = File.OpenRead(path);
                var destStream = new MemoryStream((int) sourceStream.Length);
                sourceStream.CopyTo(destStream);
                destStream.Seek(0, SeekOrigin.Begin);
                return destStream;
            }
    
            public static FileStream CopyFileStreamToFileStream(string sourcePath, string destPath)
            {
                using FileStream sourceStream = File.OpenRead(sourcePath);
                FileStream destStream = File.Create(destPath);
                sourceStream.CopyTo(destStream);
                destStream.Seek(0, SeekOrigin.Begin);
                return destStream;
            }
    
            public static FileStream CopyFileAndOpenFileStream(string sourcePath, string destPath)
            {
                File.Copy(sourcePath, destPath, true);
                return new FileStream(destPath, FileMode.Open, FileAccess.ReadWrite, FileShare.None);
            }
        }
    }
    

    除了上述 Open XML-agnostic 方法之外,您还可以使用以下方法,例如,如果您已经打开了 OpenXmlPackage ,例如 WordprocessingDocumentSpreadsheetDocumentPresentationDocument :

    public void DoWorkCloningOpenXmlPackage()
    {
        using WordprocessingDocument sourceWordDocument = WordprocessingDocument.Open(SourcePath, false);
    
        // There are multiple overloads of the Clone() method in the Open XML SDK.
        // This one clones the source document to the given destination path and
        // opens it in read-write mode.
        using var wordDocument = (WordprocessingDocument) sourceWordDocument.Clone(DestPath, true);
    
        ChangeWordprocessingDocument(wordDocument);
    }
    

    以上所有方法都可以正确克隆或复制文档。但最有效的方法是什么?

    进入我们的基准测试,它使用 BenchmarkDotNet NuGet 包:
    using System;
    using System.Collections.Generic;
    using System.Diagnostics.CodeAnalysis;
    using System.IO;
    using System.Linq;
    using BenchmarkDotNet.Attributes;
    using CodeSnippets.IO;
    using CodeSnippets.OpenXml.Wordprocessing;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
    namespace CodeSnippets.Benchmarks.IO
    {
        public class FileClonerBenchmark
        {
            #region Setup and Helpers
    
            private const string SourcePath = "Source.docx";
            private const string DestPath = "Destination.docx";
    
            [Params(1, 10, 100, 1000)]
            public static int ParagraphCount;
    
            [GlobalSetup]
            public void GlobalSetup()
            {
                CreateTestDocument(SourcePath);
                CreateTestDocument(DestPath);
            }
    
            private static void CreateTestDocument(string path)
            {
                const string sentence = "The quick brown fox jumps over the lazy dog.";
                string text = string.Join(" ", Enumerable.Range(0, 22).Select(i => sentence));
                IEnumerable<string> texts = Enumerable.Range(0, ParagraphCount).Select(i => text);
                using WordprocessingDocument unused = WordprocessingDocumentFactory.Create(path, texts);
            }
    
            private static void ChangeWordprocessingDocument(WordprocessingDocument wordDocument)
            {
                Body body = wordDocument.MainDocumentPart.Document.Body;
                Text text = body.Descendants<Text>().First();
                text.Text = DateTimeOffset.UtcNow.Ticks.ToString();
            }
    
            #endregion
    
            #region Benchmarks
    
            [Benchmark(Baseline = true)]
            public void DoWorkUsingReadAllBytesToMemoryStream()
            {
                using MemoryStream destStream = FileCloner.ReadAllBytesToMemoryStream(SourcePath);
    
                using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true))
                {
                    ChangeWordprocessingDocument(wordDocument);
                }
    
                File.WriteAllBytes(DestPath, destStream.GetBuffer());
            }
    
            [Benchmark]
            public void DoWorkUsingCopyFileStreamToMemoryStream()
            {
                using MemoryStream destStream = FileCloner.CopyFileStreamToMemoryStream(SourcePath);
    
                using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true))
                {
                    ChangeWordprocessingDocument(wordDocument);
                }
    
                File.WriteAllBytes(DestPath, destStream.GetBuffer());
            }
    
            [Benchmark]
            public void DoWorkUsingCopyFileStreamToFileStream()
            {
                using FileStream destStream = FileCloner.CopyFileStreamToFileStream(SourcePath, DestPath);
                using WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true);
                ChangeWordprocessingDocument(wordDocument);
            }
    
            [Benchmark]
            public void DoWorkUsingCopyFileAndOpenFileStream()
            {
                using FileStream destStream = FileCloner.CopyFileAndOpenFileStream(SourcePath, DestPath);
                using WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true);
                ChangeWordprocessingDocument(wordDocument);
            }
    
            [Benchmark]
            public void DoWorkCloningOpenXmlPackage()
            {
                using WordprocessingDocument sourceWordDocument = WordprocessingDocument.Open(SourcePath, false);
                using var wordDocument = (WordprocessingDocument) sourceWordDocument.Clone(DestPath, true);
                ChangeWordprocessingDocument(wordDocument);
            }
    
            #endregion
        }
    }
    

    上面的基准测试运行如下:
    using BenchmarkDotNet.Running;
    using CodeSnippets.Benchmarks.IO;
    
    namespace CodeSnippets.Benchmarks
    {
        public static class Program
        {
            public static void Main()
            {
                BenchmarkRunner.Run<FileClonerBenchmark>();
            }
        }
    }
    

    我的机器上的结果是什么?哪种方法最快?
    BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
    Intel Core i7-7500U CPU 2.70GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
    .NET Core SDK=3.0.100
      [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
      DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
    
    | Method                                  | ParaCount |      Mean |     Error |    StdDev |    Median | Ratio |
    | --------------------------------------- | --------- | --------: | --------: | --------: | --------: | ----: |
    | DoWorkUsingReadAllBytesToMemoryStream   | 1         |  1.548 ms | 0.0298 ms | 0.0279 ms |  1.540 ms |  1.00 |
    | DoWorkUsingCopyFileStreamToMemoryStream | 1         |  1.561 ms | 0.0305 ms | 0.0271 ms |  1.556 ms |  1.01 |
    | DoWorkUsingCopyFileStreamToFileStream   | 1         |  2.394 ms | 0.0601 ms | 0.1100 ms |  2.354 ms |  1.55 |
    | DoWorkUsingCopyFileAndOpenFileStream    | 1         |  3.302 ms | 0.0657 ms | 0.0855 ms |  3.312 ms |  2.12 |
    | DoWorkCloningOpenXmlPackage             | 1         |  4.567 ms | 0.1218 ms | 0.3591 ms |  4.557 ms |  3.13 |
    |                                         |           |           |           |           |           |       |
    | DoWorkUsingReadAllBytesToMemoryStream   | 10        |  1.737 ms | 0.0337 ms | 0.0361 ms |  1.742 ms |  1.00 |
    | DoWorkUsingCopyFileStreamToMemoryStream | 10        |  1.752 ms | 0.0347 ms | 0.0571 ms |  1.739 ms |  1.01 |
    | DoWorkUsingCopyFileStreamToFileStream   | 10        |  2.505 ms | 0.0390 ms | 0.0326 ms |  2.500 ms |  1.44 |
    | DoWorkUsingCopyFileAndOpenFileStream    | 10        |  3.532 ms | 0.0731 ms | 0.1860 ms |  3.455 ms |  2.05 |
    | DoWorkCloningOpenXmlPackage             | 10        |  4.446 ms | 0.0880 ms | 0.1470 ms |  4.424 ms |  2.56 |
    |                                         |           |           |           |           |           |       |
    | DoWorkUsingReadAllBytesToMemoryStream   | 100       |  2.847 ms | 0.0563 ms | 0.0553 ms |  2.857 ms |  1.00 |
    | DoWorkUsingCopyFileStreamToMemoryStream | 100       |  2.865 ms | 0.0561 ms | 0.0786 ms |  2.868 ms |  1.02 |
    | DoWorkUsingCopyFileStreamToFileStream   | 100       |  3.550 ms | 0.0697 ms | 0.0881 ms |  3.570 ms |  1.25 |
    | DoWorkUsingCopyFileAndOpenFileStream    | 100       |  4.456 ms | 0.0877 ms | 0.0861 ms |  4.458 ms |  1.57 |
    | DoWorkCloningOpenXmlPackage             | 100       |  5.958 ms | 0.1242 ms | 0.2727 ms |  5.908 ms |  2.10 |
    |                                         |           |           |           |           |           |       |
    | DoWorkUsingReadAllBytesToMemoryStream   | 1000      | 12.378 ms | 0.2453 ms | 0.2519 ms | 12.442 ms |  1.00 |
    | DoWorkUsingCopyFileStreamToMemoryStream | 1000      | 12.538 ms | 0.2070 ms | 0.1835 ms | 12.559 ms |  1.02 |
    | DoWorkUsingCopyFileStreamToFileStream   | 1000      | 12.919 ms | 0.2457 ms | 0.2298 ms | 12.939 ms |  1.05 |
    | DoWorkUsingCopyFileAndOpenFileStream    | 1000      | 13.728 ms | 0.2803 ms | 0.5196 ms | 13.652 ms |  1.11 |
    | DoWorkCloningOpenXmlPackage             | 1000      | 16.868 ms | 0.2174 ms | 0.1927 ms | 16.801 ms |  1.37 |
    

    结果证明 DoWorkUsingReadAllBytesToMemoryStream() 始终是最快的方法。但是,DoWorkUsingCopyFileStreamToMemoryStream() 的余量很容易带有误差余量。这意味着您应该尽可能在 MemoryStream 上打开 Open XML 文档以进行处理。如果您不必将生成的文档存储在您的文件系统中,这甚至会比不必要地使用 FileStream 快得多。

    无论在何处涉及输出 FileStream,您都会看到更“显着”的差异(请注意,如果您处理大量文档,则一毫秒可能会有所不同)。并且您应该注意到,使用 File.Copy() 实际上并不是一个好方法。

    最后,使用 OpenXmlPackage.Clone() 方法或其覆盖之一被证明是最慢的方法。这是因为它涉及比仅仅复制字节更复杂的逻辑。但是,如果您得到的只是对 OpenXmlPackage(或其子类之一)的引用,那么 Clone() 方法及其覆盖是您的最佳选择。

    您可以在我的 CodeSnippets GitHub 存储库中找到完整的源代码。查看 CodeSnippets.Benchmark 项目和 FileCloner 类。

    关于c# - 克隆 Office Open XML 文档的最有效方法是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59090254/

    相关文章:

    c# - 如何在 .NET Core 中模拟/ stub HttpRequestMessage?

    .net - 如何在 .NET Core 中使用默认依赖注入(inject)从父级创建子范围?

    c# - 使用 Windows 域帐户和应用程序管理的帐户

    c# - 如何在 Windows 时区和 IANA 时区之间进行转换?

    .net - 在 Azure CI 构建期间找不到任何程序集

    c# - 如何使用 C# 4.0 检测 Windows 8 操作系统?

    c# - .NET 核心 3.1 gRPC docker : Could not make proto path relative

    javascript - 使用 WCF REST 服务和 Javascript 中的流以及 URL 参数上传文件

    c# - 更新时无法从下拉列表中拉出选定的值/项目

    .net - 将 Npgsql 用于 EntityFramework 的 Postgis 地理类型