apache - 如何设置为同一段落定义不同样式

我正在尝试转换 html 文本以生成单词表。它工作得很好，并且创建的word文件是正确的，除了字符样式。

这是我第一次尝试 Apache POI。

到目前为止，我能够从文本段落中检测到换行 (
) 标记(请参阅下面的代码)。但我还想检查一些其他标签，例如、

、并为每个部分设置正确的运行值。

例如:
这是我的文本，现在是斜体，但根据其重要性也可以是粗体

我想我应该解析文本，并为每个部分应用不同的运行，但我不知道该怎么做。

private static XWPFParagraph getTableParagraph(XWPFTableCell cell, String text) { int fontsize= 11; XWPFParagraph paragraph = cell.addParagraph(); cell.removeParagraph(0); paragraph.setSpacingAfterLines(0); paragraph.setSpacingAfter(0); XWPFRun myRun1 = paragraph.createRun(); if (text==null) text=""; else { while (true) { int x = text.indexOf(" "); if (x <0) break; String work = text.substring(0,x ); text= text.substring(x+4); myRun1.setText(work); myRun1.addBreak(); } } myRun1.setText(text); myRun1.setFontSize(fontsize); return paragraph; }

最佳答案

在转换 HTML 文本时，决不应该仅使用字符串方法来转换 HTML。 XML 和 HTML 都是标记语言。它们的内容是标记而不仅仅是纯文本。需要遍历标记以获取所有单个节点及其含义。这个遍历过程绝不是微不足道的，因此需要特殊的库。在这些库的深处还需要使用字符串方法，但这些方法被包装到用于遍历标记的有用方法中。

用于遍历HTML jsoup例如可以使用。特别是NodeTraversor使用NodeVisitor对于遍历 HTML 很有用。

我的示例创建了一个实现 NodeVisitor 的 ParagraphNodeVisitor。此接口(interface)请求方法 public void head(Node node, int depth) ，每次 NodeTraversor 位于节点头部时调用该方法，并且 public void tail(Node node, int height) 每次 NodeTraversor 位于节点尾部时都会调用该函数。在这些方法中，可以实现处理单个节点的过程。在我们的例子中，该过程的主要部分是我们是否需要新的 XWPFRun 以及此运行需要哪些设置。

示例:

import java.io.FileOutputStream; import org.apache.poi.xwpf.usermodel.*; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Node; import org.jsoup.nodes.TextNode; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import org.jsoup.select.NodeVisitor; import org.jsoup.select.NodeTraversor; public class HTMLtoDOCX { private XWPFDocument document; public HTMLtoDOCX(String html, String docxPath) throws Exception { this.document = new XWPFDocument(); XWPFParagraph paragraph = null; Document htmlDocument = Jsoup.parse(html); Elements htmlParagraphs = htmlDocument.select("p"); for(Element htmlParagraph : htmlParagraphs) { System.out.println(htmlParagraph); paragraph = document.createParagraph(); createParagraphFromHTML(paragraph, htmlParagraph); } FileOutputStream out = new FileOutputStream(docxPath); document.write(out); out.close(); document.close(); } void createParagraphFromHTML(XWPFParagraph paragraph, Element htmlParagraph) { ParagraphNodeVisitor nodeVisitor = new ParagraphNodeVisitor(paragraph); NodeTraversor.traverse(nodeVisitor, htmlParagraph); } private class ParagraphNodeVisitor implements NodeVisitor { String nodeName; boolean needNewRun; boolean isItalic; boolean isBold; boolean isUnderlined; int fontSize; String fontColor; XWPFParagraph paragraph; XWPFRun run; ParagraphNodeVisitor(XWPFParagraph paragraph) { this.paragraph = paragraph; this.run = paragraph.createRun(); this.nodeName = ""; this.isItalic = false; this.isBold = false; this.isUnderlined = false; this.fontSize = 11; this.fontColor = "000000"; } @Override public void head(Node node, int depth) { nodeName = node.nodeName(); System.out.println("Start "+nodeName+": " + node); if ("#text".equals(nodeName)) { run.setText(((TextNode)node).text()); } else if ("i".equals(nodeName)) { isItalic = true; } else if ("b".equals(nodeName)) { isBold = true; } else if ("u".equals(nodeName)) { isUnderlined = true; } else if ("br".equals(nodeName)) { run.addBreak(); } else if ("font".equals(nodeName)) { fontColor = (!"".equals(node.attr("color")))?node.attr("color").substring(1):"000000"; fontSize = (!"".equals(node.attr("size")))?Integer.parseInt(node.attr("size")):11; } run.setItalic(isItalic); run.setBold(isBold); if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE); run.setColor(fontColor); run.setFontSize(fontSize); } @Override public void tail(Node node, int depth) { nodeName = node.nodeName(); System.out.println("End "+nodeName); if ("#text".equals(nodeName)) { run = paragraph.createRun(); //after setting the text in the run a new run is needed } else if ("i".equals(nodeName)) { isItalic = false; } else if ("b".equals(nodeName)) { isBold = false; } else if ("u".equals(nodeName)) { isUnderlined = false; } else if ("br".equals(nodeName)) { run = paragraph.createRun(); //after setting a break a new run is needed } else if ("font".equals(nodeName)) { fontColor = "000000"; fontSize = 11; } run.setItalic(isItalic); run.setBold(isBold); if (isUnderlined) run.setUnderline(UnderlinePatterns.SINGLE); else run.setUnderline(UnderlinePatterns.NONE); run.setColor(fontColor); run.setFontSize(fontSize); } } public static void main(String[] args) throws Exception { String html = "Text without tags. Then bold having break. Then without tags again." +"First paragraph. Just like a heading" +"This is my text which now is in italic but also in bold depending on its importance. Now a new line starts within the same paragraph." +"Last paragraph comes here finally." +"But yet another paragraph having special font settings. Now default font again." ; HTMLtoDOCX htmlToDOCX = new HTMLtoDOCX(html, "./CreateWordParagraphFromHTML.docx"); } }

结果:

免责声明:这是一个展示原理的工作草案。它既没有完全准备好，也没有准备好在生产环境中使用的代码。

关于apache - 如何设置为同一段落定义不同样式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/54268485/

apache - 如何设置为同一段落定义不同样式

上一篇：mpandroidchart - HorizontalBarChart moveViewToX 不起作用

下一篇：python - 如何使用 Python3 在 CQL 中准备 SELECT 查询？

apache - 如何设置为同一段落定义不同样式

上一篇：mpandroidchart - Horizo​​ntalBarChart moveViewToX 不起作用

下一篇：python - 如何使用 Python3 在 CQL 中准备 SELECT 查询？

上一篇：mpandroidchart - HorizontalBarChart moveViewToX 不起作用