java - 从包含 TinyMCE (html) 内容的 JSON 对象生成 PDF

TL;博士

如何从包含用 HTML 编写的字符串的 JSON 对象创建 PDF。

示例 JSON:

{
  dimensions: {
    height: 297,
    width: 210
  },
  boxes: [
    {
      dimensions: {
        height: 10,
        width: 190
      },
      position: {
        x: 10,
        y: 10
      },
      content: "<h1>Hello StackOverflow</h1>, I think you are <strong></strong>! I hope someone can answer this!"
    }
  ]
}

前端使用的技术:AngularJS 1.4.9 , ui.tinymce , ment.io

后端:任何工作。

我希望能够为 PDF 创建模板。用户在 textarea 中写入一些文本，使用一些稍后将替换为实际数据的变量，当用户按下按钮时，应该返回一个 PDF 与完成的产品。
这应该是非常通用的。所以它几乎可以用于任何事情。

所以，最小的例子:用户在 TinyMCE 中写了一些文本，比如

<h1>Hello #[COMMUNITY]</h1>, I think you are <strong>great</strong>! I hope someone can answer this!

此文本包含用户在 ment.io 插件的帮助下获得的两个变量。实际变量由 Controller 提供。
这篇文章是用 TinyMCE 的 AngularJS 版本编写的，它上面还有 Ment.io，它提供了可用变量的一个很好的 View 。

当用户按下 Save按钮，创建一个如下所示的 JSON 对象，即模板。

{
  dimensions: {
    height: 297,
    width: 210
  },
  boxes: [
    {
      dimensions: {
        height: 10,
        width: 190
      },
      position: {
        x: 10,
        y: 10
      },
      content: "user input"
    }
  ]
}

我在 Angular 中有一个指令，它可以真正生成任意数量的框，任何大小(generic-ho!)。这部分效果很好。只需在第一个 dimensions 中发送您想要多大的“页面”(以毫米为单位，因此示例显示 A4 纸尺寸)如您在对象中看到的对象。然后在方框中定义它们应该有多大，以及它应该在“纸”上的什么位置。最后是用户在 TinyMCE 文本区域中写入的内容。

下一步:后端用实际数据替换变量。然后将其传递给生成器。

然后我们来到棘手的部分:实际的生成器。这应该最好接受 JSON。这样做的原因是因为任何项目都应该能够使用它。前端和 PDF 生成器齐头并进。他们不在乎中间是什么。这意味着生成器几乎可以用任何东西编写。不过，我是一名 Java 开发人员，所以 Java 更可取(因此是 Java-tag)。

我找到的解决方案是:

PDFbox ，但使用它的问题是 TinyMCE 产生的内容。 TinyMCE 输出 HTML 或 XML。 PDFBox 根本不处理这个问题。这意味着我必须编写自己的 HTML 或 XML 解析器来尝试找出用户想要粗体文本的位置，以及她想要斜体、标题、其他字体等的位置。我真的不想要这样。我以前被烧死过。另一方面，它非常适合将文本放置在正确的位置。即使它是原始文本。

我读过 iText做HTML。但是 AGPL 许可证几乎杀死了它。

我也看过 Flying Saucer它采用 XHTML 并创建 PDF。但它似乎依赖于 iText。

我现在正在查看的解决方案是使用 Apache FOP 的复杂方式. FOP 需要一个 XSL-FO 对象来处理。所以这里的问题是实际动态地创建那个 XSL-FO 对象。我还读到 XSL-FO 标准已被放弃，因此不确定这种方法是否能经得起 future 的考验。我从未使用过 FOP 和 XSLT。因此，这项任务似乎很艰巨。
我目前正在查看的是从 TinyMCE 获取输出，通过类似 JTidy 的方式运行它。获取 XHTML。从 XHTML 创建一个 XSLT 文件(以某种神奇的方式)。从 XHTML 和 XSLT 创建一个 XSL-FO 对象。并从 XSL-FO 文件生成 PDF。请告诉我有一个更简单的方法。

我不可能是第一个想做这样的事情的人。然而，寻找答案似乎很少产生实际结果。

所以我的问题基本上是这样的:如何从像上面这样包含 HTML 的 JSON 对象创建 PDF，并让生成的文本看起来像在 TinyMCE 中编写时的样子？
请记住，该对象可以包含无限数量的框。

最佳答案

所以。经过一些研究和工作后，我决定实际使用 PDFbox 进行生成。我对我接受的内容输入也非常严格。现在，我真的只接受粗体、斜体和标题。所以我寻找<strong> , <em> , 和 <h[1-6]>标签。

首先，我稍微更新了我的输入 JSON，实际上是更多的包装。

{
   [
      documents: [
        {
          pages: [
            {
              dimensions: {width: 210, height, 297},
              boxes: [
                dimensions: {width: 190, height: 40},
                placement: {x: 10, y, 10},
                content: "Hello <strong>StackOverflow</strong>!"
              ]
            }
          ]
        }
      ]
   ]
}

原因是我希望能够在同一个 PDF 中输出大量文档。想想你是否正在大量发送信件。每个文档都略有不同，但您仍然希望它们都在同一个 PDF 中。你当然可以只用页面级别来完成这一切，但如果一个文档是几页，我认为分开会更好。

我的实际代码大约有 500 行长，所以我不会在这里全部粘贴，只是提供帮助的基本部分，而且仍然在 150 行左右。
开始:

public class Generator {
   public static ByteArrayOutputStream generatePDF(final Bundle bundle) {
      final ByteArrayOutputStream output = new ByteArrayOutputStream();

      pdf = new PDDocument();
      for (final Document document : bundle.documents) {
         for (final Page page : document.pages) {
            pdf.addPage(generatePage(pdf, page));
         }
      }
      pdf.save(output);
      pdf.close();

      return output;
   }

   private static generatePage(final PDDocument document, final Page page) {
      final PDRectangle rect = new PDRectangle(mmToPoints(page.dimensions.width)mmToPoints(page.deminsions.height));
      final PDPage pdPage = new PDPage(rect);
      final PDPageContentStream cs = new PDPageContentStream(document, pdPage);

      for (final Box box : page.boxes) {
         resetFont(cs); // Reset the font when starting new box so missing ending tags don't mess up the next box.

         final String pc = processContent(box.content); // Make the content prettier. Eg. strip all <p>, replace </p> with \n, strip all <div> tags, etc.

         lines(Arrays.asList(processContent.split("\n")), box, cs);
      }
      cs.close();
      return pdPage;
   }

   private static float mmToPoints(final float mm) {
      // 1 inch == 72 points (standard DPI), 1 inch == 25.4mm. So, mm to points means (mm / inchInmm) * pointsInInch
      return (float) ((mm / 25.5) * 72);
   }

   private static lines(final List<String> lines, final Box box, final PDPageContentStream cs) {
      if (lines.size() == 0) { return; }
      cs.beginText();
      cs.moveTextPositionByAmount(mmToPoints(box.placement.x), mmToPoints(box.placement.y));
      // Now we begin the tricky part
      for (int i = 0, length = lines.size; i < length; ++i) {
         final String line = lines.get(i);
         final List<Word> wordList = new ArrayList<>();
         final String[] splitArray = line.split(" ");
         final float fontHeight = fontHeight(currentFont(), currentFontSize()); // Documented elsewhere
         cs.appendRawCommands(fontHeight + " TL\n");
         if (i == 0) { addNewLine(cs); } // PDFbox starts at the bottom, we start at the top. Add new line so we are inside the box
         for (final String index : splitArray) {
            final String word = index + " "; // We removed spaces when we split on them, add it to words now.
            final StringBuilder wordBuilder = new StringBuilder();
            boolean addWord = true;
            for (int j = 0; wordLength = word.length(); j < wordLength ;                ++j){
               final char c = word.charAt(j);
               if (c == '<') { // check for <strong> and those
                  final StringBuilder command = new StringBuilder();
                  if (addWord && wordBuilder.length() > 0) {
                     wordList.add(new Word(wordBuilder.toString(), currentFont(), currentFontSize()));
                     wordBuilder.setLength(0);
                     addWord = false;
                  }
                  for (; j < wordLength; ++j) {
                     final char c1 = word.charAt(j);
                     command.append(c1);
                     if (c1 == '>') {
                        if (j + 1 < wordLength) { addWord = true; }
                        break;
                     }
                  }
                  final boolean b = parseForFontChange(command.toString());
                  if (!b) { // If it wasn't a command, we want to append it to out text
                     wordBuilder.append(command.toString());
                  }
               } else if (c == '&') { // check for html escaped entities
                  final int longestHTMLEntityName = 24 + 2; // &ClocwiseContourIntegral;
                  final StringBuilder escapedChar = new StringBuilder();
                  escapedChar.append(c);
                  int k = 1;
                  for (; k < longestHTMLEntityName && j + k < wordLength; ++k) {
                     final char c1 = word.charAt(j + k);
                     if (c1 == '<' || c1 == '>') { break; } // Can't be an espaced char.
                     escapedChar.append(c1);
                     if (c1 == ';') { break; } // End of char
                  }
                  if (escapedChar.indexOf(";") < 0) { k--; }
                  wordBuilder.append(StringEspaceUtils.unescapedHtml4(escapedChar.toString()));
                  j += k;
               } else {
                  wordBuilder.append(c);
               }
            }
            if (addWord) {
               wordList.append(new Word(wordBuilder.toString(), currentFont(), currentFontSize()));
            }
         }
         writeWords(wordList, box, cs);
         if (i < length - 1) { addNewLine(cs); }
      }
      cs.endText();
   }

   public static void writeWords(final List<Word> words, final Box box, final PDPageContentStream cs) {
      final float boxWidth = mmToPoints(box.dimensions.width);
      float lineWidth = 0;
      for (final Word word : words) {
         lineWidth += word.width;
         if (lineWidth > boxWidth) {
            addNewLine(cs);
            lineWidth = word.width;
         }
         if (lineWidth > boxWidth) { // Word longer than box width
            lineWidth = 0;
            final String string = word.string;
            for (int i = 0, length = string.length(); i < length; ++i) {
               final char c = string.charAt(i);
               final float charWidth = calculateStringWidth(String.valueOf(c), word.font, word.fontSize);
               lineWidth += charWidth;
               if (lineWidth > boxWidth) {
                  addNewLine(cs);
                  lineWidth = charwidth);
               }
               drawChar(c, word.font, word.fontSize, cs);
            }
         } else {
            draWord(word, cs);
         }
      }
   }
}

public class Word {
   public final String string;
   public final PDFont font;
   public final float fontSize;
   public final float width;
   public final float height;

   public Word(final String string, final PDFont font, final float fontSize) {
      this.string = string;
      this.font = font;
      this.fontSize = fontSize;
      this.width = calculateStringWidth(string, font, fontSize);
      this.height = calculateStringHeight(string, font, fontSize);
   }
}

我希望这可以帮助其他面临同样问题的人。拥有 Word 的原因类是如果你想 split 的话，而不是字符。
许多其他帖子描述了如何使用这些辅助方法，例如 calculateStringWidth等等。所以他们不在这里。

查看How to Insert a Linefeed with PDFBox drawString对于换行符和 fontHeight。

How to generate multiple lines in PDF using Apache pdfbox对于字符串宽度。

在我的情况下 parseForFontChange方法更改当前字体和字体大小。 Activity 的内容当然是由方法 currentFont() 返回的。和 currentFontSize .我使用像 (?ui:(<strong>)) 这样的正则表达式检查是否有粗体标签。用适合你的。

关于java - 从包含 TinyMCE (html) 内容的 JSON 对象生成 PDF，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35482644/

java - 从包含 TinyMCE (html) 内容的 JSON 对象生成 PDF

上一篇：java - 如何根据 http 状态代码路由从 http 出站网关发出的错误消息？

下一篇：java - Csv 文件循环工作错误