c# - 使用 iTextSharp 进行 PDF 压缩

标签 c# pdf itext

<分区>

我目前正在尝试重新压缩已经创建的 pdf,我正在尝试找到一种方法来重新压缩文档中的图像,以减小文件大小。

我一直在尝试使用 DataLogics PDE 和 iTextSharp 库来执行此操作,但我找不到对项目进行流重新压缩的方法。

虽然我已经考虑过循环 xobjects 并获取图像,然后将 DPI 降低到 96 或使用 libjpeg C# implimentation 来改变图像的质量,但将它放回 pdf 流似乎总是以结束,内存损坏或其他问题。

任何 sample 将不胜感激。

谢谢

最佳答案

iText 和 iTextSharp 有一些替换间接对象的方法。具体来说,有 PdfReader.KillIndirect() 做它所说的和 PdfWriter.AddDirectImageSimple(iTextSharp.text.Image, PRIndirectReference) 然后你可以用它来替换你杀死的东西.

在伪 C# 代码中,您将执行以下操作:

var oldImage = PdfReader.GetPdfObject();
var newImage = YourImageCompressionFunction(oldImage);
PdfReader.KillIndirect(oldImage);
yourPdfWriter.AddDirectImageSimple(newImage, (PRIndirectReference)oldImage);

将原始字节转换为 .Net 图像可能很棘手,我会把它留给你,或者你可以在这里搜索。马克有一个 good description here .此外,从技术上讲,PDF 没有 DPI 的概念,它主要用于打印机。 See the answer here有关更多信息。

使用上述方法,您的压缩算法实际上可以做两件事,物理缩小图像以及应用 JPEG 压缩。当您物理缩小图像并将其添加回来时,它将占据与原始图像相同的空间量,但可以使用的像素更少。这将使您获得您认为的 DPI 降低。 JPEG 压缩不言自明。

以下是针对 iTextSharp 5.1.1.0 的完整工作 C# 2010 WinForms 应用程序。它在您的桌面上获取一个名为“LargeImage.jpg”的现有 JPEG,并从中创建一个新的 PDF。然后它打开 PDF,提取图像,将其物理缩小到原始大小的 90%,应用 85% 的 JPEG 压缩并将其写回 PDF。请参阅代码中的注释以获取更多解释。该代码需要更多的空/错误检查。还会查找您需要扩展以处理其他情况的 NOTE 注释。

using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Drawing.Drawing2D;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;

namespace WindowsFormsApplication1 {
    public partial class Form1 : Form {
        public Form1() {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e) {
            //Our working folder
            string workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
            //Large image to add to sample PDF
            string largeImage = Path.Combine(workingFolder, "LargeImage.jpg");
            //Name of large PDF to create
            string largePDF = Path.Combine(workingFolder, "Large.pdf");
            //Name of compressed PDF to create
            string smallPDF = Path.Combine(workingFolder, "Small.pdf");

            //Create a sample PDF containing our large image, for demo purposes only, nothing special here
            using (FileStream fs = new FileStream(largePDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
                using (Document doc = new Document()) {
                    using (PdfWriter writer = PdfWriter.GetInstance(doc, fs)) {
                        doc.Open();

                        iTextSharp.text.Image importImage = iTextSharp.text.Image.GetInstance(largeImage);
                        doc.SetPageSize(new iTextSharp.text.Rectangle(0, 0, importImage.Width, importImage.Height));
                        doc.SetMargins(0, 0, 0, 0);
                        doc.NewPage();
                        doc.Add(importImage);

                        doc.Close();
                    }
                }
            }

            //Now we're going to open the above PDF and compress things

            //Bind a reader to our large PDF
            PdfReader reader = new PdfReader(largePDF);
            //Create our output PDF
            using (FileStream fs = new FileStream(smallPDF, FileMode.Create, FileAccess.Write, FileShare.None)) {
                //Bind a stamper to the file and our reader
                using (PdfStamper stamper = new PdfStamper(reader, fs)) {
                    //NOTE: This code only deals with page 1, you'd want to loop more for your code
                    //Get page 1
                    PdfDictionary page = reader.GetPageN(1);
                    //Get the xobject structure
                    PdfDictionary resources = (PdfDictionary)PdfReader.GetPdfObject(page.Get(PdfName.RESOURCES));
                    PdfDictionary xobject = (PdfDictionary)PdfReader.GetPdfObject(resources.Get(PdfName.XOBJECT));
                    if (xobject != null) {
                        PdfObject obj;
                        //Loop through each key
                        foreach (PdfName name in xobject.Keys) {
                            obj = xobject.Get(name);
                            if (obj.IsIndirect()) {
                                //Get the current key as a PDF object
                                PdfDictionary imgObject = (PdfDictionary)PdfReader.GetPdfObject(obj);
                                //See if its an image
                                if (imgObject.Get(PdfName.SUBTYPE).Equals(PdfName.IMAGE)) {
                                    //NOTE: There's a bunch of different types of filters, I'm only handing the simplest one here which is basically raw JPG, you'll have to research others
                                    if (imgObject.Get(PdfName.FILTER).Equals(PdfName.DCTDECODE)) {
                                        //Get the raw bytes of the current image
                                        byte[] oldBytes = PdfReader.GetStreamBytesRaw((PRStream)imgObject);
                                        //Will hold bytes of the compressed image later
                                        byte[] newBytes;
                                        //Wrap a stream around our original image
                                        using (MemoryStream sourceMS = new MemoryStream(oldBytes)) {
                                            //Convert the bytes into a .Net image
                                            using (System.Drawing.Image oldImage = Bitmap.FromStream(sourceMS)) {
                                                //Shrink the image to 90% of the original
                                                using (System.Drawing.Image newImage = ShrinkImage(oldImage, 0.9f)) {
                                                    //Convert the image to bytes using JPG at 85%
                                                    newBytes = ConvertImageToBytes(newImage, 85);
                                                }
                                            }
                                        }
                                        //Create a new iTextSharp image from our bytes
                                        iTextSharp.text.Image compressedImage = iTextSharp.text.Image.GetInstance(newBytes);
                                        //Kill off the old image
                                        PdfReader.KillIndirect(obj);
                                        //Add our image in its place
                                        stamper.Writer.AddDirectImageSimple(compressedImage, (PRIndirectReference)obj);
                                    }
                                }
                            }
                        }
                    }
                }
            }

            this.Close();
        }

        //Standard image save code from MSDN, returns a byte array
        private static byte[] ConvertImageToBytes(System.Drawing.Image image, long compressionLevel) {
            if (compressionLevel < 0) {
                compressionLevel = 0;
            } else if (compressionLevel > 100) {
                compressionLevel = 100;
            }
            ImageCodecInfo jgpEncoder = GetEncoder(ImageFormat.Jpeg);

            System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
            EncoderParameters myEncoderParameters = new EncoderParameters(1);
            EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, compressionLevel);
            myEncoderParameters.Param[0] = myEncoderParameter;
            using (MemoryStream ms = new MemoryStream()) {
                image.Save(ms, jgpEncoder, myEncoderParameters);
                return ms.ToArray();
            }

        }
        //standard code from MSDN
        private static ImageCodecInfo GetEncoder(ImageFormat format) {
            ImageCodecInfo[] codecs = ImageCodecInfo.GetImageDecoders();
            foreach (ImageCodecInfo codec in codecs) {
                if (codec.FormatID == format.Guid) {
                    return codec;
                }
            }
            return null;
        }
        //Standard high quality thumbnail generation from http://weblogs.asp.net/gunnarpeipman/archive/2009/04/02/resizing-images-without-loss-of-quality.aspx
        private static System.Drawing.Image ShrinkImage(System.Drawing.Image sourceImage, float scaleFactor) {
            int newWidth = Convert.ToInt32(sourceImage.Width * scaleFactor);
            int newHeight = Convert.ToInt32(sourceImage.Height * scaleFactor);

            var thumbnailBitmap = new Bitmap(newWidth, newHeight);
            using (Graphics g = Graphics.FromImage(thumbnailBitmap)) {
                g.CompositingQuality = CompositingQuality.HighQuality;
                g.SmoothingMode = SmoothingMode.HighQuality;
                g.InterpolationMode = InterpolationMode.HighQualityBicubic;
                System.Drawing.Rectangle imageRectangle = new System.Drawing.Rectangle(0, 0, newWidth, newHeight);
                g.DrawImage(sourceImage, imageRectangle);
            }
            return thumbnailBitmap;
        }
    }
}

关于c# - 使用 iTextSharp 进行 PDF 压缩,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8740244/

相关文章:

c# - 如何从同一 Azure AD 中的另一个 Web 应用程序访问 Web api?

c# - 我应该将 await 与 async Action 方法一起使用吗?

c# - WPF VSTS 应用程序卡在 VS 凭据提示上

ios - 我有 Github PDF 阅读器使用我的 Swift 代码和 Test.pdf 文件,但是如何读取存储在 Parse 数据库中的 pdf 文件?

php - 使用 PHP 搜索 PDF 文件

c# - 在 pdf 文件中搜索文本,如果文本存在则返回坐标

c# - Canvas 作为 ListBox ItemTemplate

java - iText 5 页眉和页脚

c# - itextSharp - 合并 pdf 文件会禁用扩展阅读器权限

javascript - 用 puppeteer 创建的 PDF 不显示 Font Awesome 图标