java - 添加 PDAnnotationLinks 后的 PDFBOX 文件大小增加了一倍

标签 java pdf pdfbox

我正在从事Java项目,我有以下情况:

  1. 我有使用 apache FOP 生成的现有 PDF 文件。里面有书签,我正在使用它们:

    Map<String, PDAction> actionsMap = new HashMap<String, PDAction>();
    PDDocumentOutline bookmarks = doc1.getDocumentCatalog().getDocumentOutline();
    PDOutlineItem item = bookmarks.getFirstChild();
    while(item != null ){
       actionsMap.put(item.getTitle(), item.getAction());
       item = item.getNextSibling();
    }
    
  2. 我正在使用 PDFBOX 2.0.0 打开第二个文件(再次生成 FOP),并将 3 个 PDAnnotationLink 添加到文本的特定部分。该文件是单页,只有很少的图表。然后我从第 1 点添加操作

    PDPage page = (PDPage) diagramDocument.getDocumentCatalog().getPages().get(0);
    //objCoordinates is retrieved from another class with PDFTextStripper
    Iterator entries = objCoordinates.entrySet().iterator();
    while (entries.hasNext()) {
      Entry entry = (Entry) entries.next();
      String key = (String) entry.getKey();
      PDAnnotationLink txtLink = new PDAnnotationLink();
      PDBorderStyleDictionary borderULine = new PDBorderStyleDictionary();
      borderULine.setWidth(0);
      txtLink.setBorderStyle(borderULine);
      PDActionGoTo action = (PDActionGoTo) actionsMap.get(key);
      txtLink.setAction(action);
    
      final float[] quads = (float[]) entry.getValue();
      PDRectangle rect = new PDRectangle();
      rect.setLowerLeftX(quads[0]);
      rect.setLowerLeftY(quads[5]);
      rect.setUpperRightX(quads[2]);
      rect.setUpperRightY(quads[1]);
      txtLink.setRectangle(rect);
    
      page.getAnnotations().add(txtLink);
    } 
    

保存第二个文件后,链接可以正常工作,但文件大小增加了一倍。 PDF版本是1.6。该文件已经有过滤器 FlateDecode。 我确实尝试过在线 PDF 文件比较(初始文件和带有链接的结果文件),但结果是文件没有区别。 当我用文本编辑器打开文件时,有 - 原始文件 - 1 个类型/页面/实例 - 结果文件 - 18 个/Type/Page 实例 我的猜测是 PDFBOX 添加了一些额外的(重复的?)信息。

如果有人遇到过这个问题,我将不胜感激。

谢谢

最佳答案

我只是想知道如何给你一个信用蒂尔曼......:-) 好的,我已经重新设计并简化了代码,因此我可以将其发布在这里。 希望清楚

    import java.io.File;
    import java.util.ArrayList;
    import java.util.HashMap;
    import java.util.Iterator;
    import java.util.List;
    import java.util.Map;
    import java.util.Map.Entry;

    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDPage;
    import org.apache.pdfbox.pdmodel.common.PDRectangle;
    import org.apache.pdfbox.pdmodel.interactive.action.PDAction;
    import org.apache.pdfbox.pdmodel.interactive.action.PDActionGoTo;
    import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink;
    import org.apache.pdfbox.pdmodel.interactive.annotation.PDBorderStyleDictionary;
    import org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDDocumentOutline;
    import org.apache.pdfbox.pdmodel.interactive.documentnavigation.outline.PDOutlineItem;
    import org.apache.pdfbox.text.PDFTextStripper;

    /**
     * @author micky
     *
     * The class merges PDF files
     *  - one file with item details info
     *  - one or more files with items diagrams
     *  
     *   Purpose is to merge diagram files into item details file and create links 
     *   from the items in diagrams to item details
     */
    public class PDFReportHyperlinks {

         public static void main(String[] args){

             PDDocument reportDocument = null;
             try {

                 String reportFileName = "D:/ItemsDetails.pdf";

                 Map<String, PDAction> actionsMap = new HashMap<String, PDAction>();
                 reportDocument = PDDocument.load(new File(reportFileName));

                 // Get the bookmarks i.e. existing GoTo actions
                 PDDocumentOutline bookmarks = reportDocument.getDocumentCatalog().getDocumentOutline();
                 PDOutlineItem item = bookmarks.getFirstChild();
                 while(item != null ){
                     actionsMap.put(item.getTitle(), item.getAction());
                     item = item.getNextSibling();
                 }

                 // Diagram files, they have single page
                 List diagamFiles = new ArrayList<String>() {{
                        add("D:/Diagram_1.pdf");
                        add("D:/Diagram_2.pdf");
                        add("D:/Diagram_3.pdf");
                 }};

                 Iterator diagramsIt = diagamFiles.iterator();
                 while (diagramsIt.hasNext()) {
                     String diagramName = (String) diagramsIt.next();

                     //--<Import diagram>---------------------------------
                     PDDocument sourceDocument = PDDocument.load(new File(diagramName));
                     PDPage pp = (PDPage) sourceDocument .getDocumentCatalog().getPages().get(0);
                     PDPage page = reportDocument.importPage(pp);

                     //--<Create links from diagrams to objects>---------------------------------

                     // TextStripper is separate class extending PDFTextStripper
                     // It is searching for items names and returning Map with their coordinates
                     TextStripper stripper = new TextStripper(sourceDocument,
                             new ArrayList<String>() {{
                                 add("Item1_Name");
                                 add("Item2_Name");
                                 add("Item3_Name");
                                 add("Item4_Name");
                                 add("Item5_Name");
                             }});

                     Map<String, float[]> objCoordinates = stripper.getObjCoordinates();

                     Iterator entries = objCoordinates.entrySet().iterator();
                     while (entries.hasNext()) {
                         Entry entry = (Entry) entries.next();
                         String key = (String) entry.getKey();

                         PDAnnotationLink txtLink = new PDAnnotationLink();
                         PDBorderStyleDictionary borderULine = new PDBorderStyleDictionary();
                         borderULine.setWidth(0);
                         txtLink.setBorderStyle(borderULine);
                         PDActionGoTo action = (PDActionGoTo) actionsMap.get(key);
                         txtLink.setAction(action);

                         final float[] quads = (float[]) entry.getValue();
                         PDRectangle rect = new PDRectangle();
                         rect.setLowerLeftX(quads[0]);
                         rect.setLowerLeftY(quads[5]);
                         rect.setUpperRightX(quads[2]);
                         rect.setUpperRightY(quads[1]);
                         txtLink.setRectangle(rect);

                         page.getAnnotations().add(txtLink);
                     }

                     //--<Create bookmarks for new pages (diagrams)>---------------------------------
                     PDOutlineItem menuItem = new PDOutlineItem();
                     menuItem.setTitle(diagramName);
                     menuItem.setDestination(page);
                     bookmarks.addLast(menuItem);
                     menuItem.openNode();
                     bookmarks.openNode();

                 }

                 reportDocument.save(new File(reportFileName));
                 reportDocument.close();

                 // Alternative merging documents example not feasible in this case
                 //PDFMergerUtility ut = new PDFMergerUtility();
                 //ut.addSource(reportFileName);
                 //diagramsIt = diagamFiles.iterator();
                 //while (diagramsIt.hasNext()) {
                 //  String diagramName = (String) diagramsIt.next();
                 //  ut.addSource(diagramName);
                 // }
                 //ut.setDestinationFileName(reportFileName);
                 //ut.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());

                System.out.println("COMPLETED");

             } catch (Exception e) {
                 System.out.println(e);
             } finally {
                 try {
                     reportDocument.close();
                 } catch (Exception e) {
                     System.out.println(e);
                 }
            }
        }
    }

关于java - 添加 PDAnnotationLinks 后的 PDFBOX 文件大小增加了一倍,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36175076/

相关文章:

javascript - 使用 pdf.js 的 PDF 图像质量很差

html - 是否可以居中和缩放嵌入的 PDF 内容

java - PDF框。 Java : How to print only one page of PDF instead of full document?

java - 切换对象之间的关联方向

java - 连接字符串的最有效方法

java - 缺少@AnnotationDrivenTx Spring3

java - 将 PDF 转换为 HTML 文件 Java API

java - 从 PDF 中提取的 "Empty"字符

java - PDFBox 没有将我想要的消息写入页面

java - 识别和非识别关系