我无法从 html 获取 PDF 中的特殊字符。我尝试使用 UTF-8 读取、Windows-1257、ISO-8859-13 等。但没有任何效果,我只是得到空格。
那么问题是如何解决这个问题?
Java
String d1 = "<html><head></head><body>...ą...č...ę...ė...į...š...ų...ū...ž...Ą...Č...Ę...Ė...Į...Š...Ų...Ū...Ž...</body></html>";
OutputStream myFile = new FileOutputStream(new File("C:\\My\\pdf1.pdf"));
Document document = new Document();
document.addCreationDate();
document.setPageSize(PageSize.A4);
document.setMargins(36, 36, 36, 36);
document.setMarginMirroring(true);
PdfWriter writer = PdfWriter.getInstance(document, myFile);
document.open();
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
InputStream is;
//is = new ByteArrayInputStream(d1.getBytes(StandardCharsets.UTF_8));
is = new ByteArrayInputStream(d1.getBytes("UTF-8"));
String FONT = "C:\\My\\FreeSans.ttf";
XMLWorkerFontProvider fontImp = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontImp.register(FONT);
worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"), fontImp);
document.close();
myFile.close();
最佳答案
Topaco 正确,在我添加字符串“body style...”后一切正常!
关于java - 将 HTML 转换为 PDF 的字符问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55683405/