我正在从 HTML 字符串生成 PDF 文件,但是当生成 PDF 文件时,HTML 和 PDF 中的内容不匹配。 PDF的内容是一些随机内容。我在谷歌上读到了有关这个问题的信息,他们建议使用 Unicode 符号,例如 %u0627%u0646%u0627%20%u0627%u0633%u0645%u0649%20%u0639%u0628%u062F%u0627%u0644%u0644%u0647
。但我将其放入 HTML 中,它会按原样打印。
相关问题:Writing Arabic in pdf using itext
package com.example.demo;
import com.itextpdf.html2pdf.ConverterProperties;
import com.itextpdf.html2pdf.HtmlConverter;
import com.itextpdf.styledxmlparser.css.media.MediaDeviceDescription;
import com.itextpdf.styledxmlparser.css.media.MediaType;
import com.itextpdf.html2pdf.resolver.font.DefaultFontProvider;
import com.itextpdf.layout.font.FontProvider;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) throws IOException {
SpringApplication.run(DemoApplication.class, args);
String htmlSource = getContent();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ConverterProperties converterProperties = new ConverterProperties();
FontProvider dfp = new DefaultFontProvider(true, false, false);
dfp.addFont("/Library/Fonts/Arial.ttf");
converterProperties.setFontProvider(dfp);
converterProperties.setMediaDeviceDescription(new MediaDeviceDescription(MediaType.PRINT));
HtmlConverter.convertToPdf(htmlSource, outputStream, converterProperties);
byte[] bytes = outputStream.toByteArray();
File pdfFile = new File("java19.pdf");
FileOutputStream fos = new FileOutputStream(pdfFile);
fos.write(bytes);
fos.flush();
fos.close();
}
private static String getContent() {
return "<!DOCTYPE html>\n" +
"<html lang=\"en\">\n" +
"\n" +
"<head>\n" +
" <meta charset=\"UTF-8\">\n" +
" <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n" +
" <meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\">\n" +
" <title>Document</title>\n" +
" <style>\n" +
" @page {\n" +
" margin: 0;\n" +
" font-family: arial;\n" +
" }\n" +
" </style>\n" +
"</head>\n" +
"\n" +
"<body\n" +
" style=\"margin: 0;padding: 0;font-family: arial, sans-serif;font-size: 14px;line-height: 125%;width: 100%;-ms-text-size-adjust: 100%;-webkit-text-size-adjust: 100%;color: #222222;\">\n" +
" <table cellpadding=\"0\" cellspacing=\"0\" width=\"100%\" style=\"background: white; direction: rtl;\">\n" +
" <tbody>\n" +
" <tr>\n" +
" <td style=\"padding: 0 35px;\">\n" +
" <p> انا اسمى عبدالله\n" +
" </p>\n" +
" </td>\n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
"\n" +
"</body>\n" +
"\n" +
"</html>";
}
}
最佳答案
如果没有看到错误的输出,就很难确定问题到底是什么。但你的“随机内容”听起来像是编码问题。
由于您的阿拉伯语内容直接位于源代码中,因此您必须小心编码。例如,使用 ISO-8859-1
,生成的 PDF 输出为:
使用 Unicode 转义序列 (\uXXXX
),您确实可以避免其中一些编码问题。更换
" <p> انا اسمى عبدالله\n" +
与
" <p>\u0627\u0646\u0627 \u0627\u0633\u0645\u0649 \u0639\u0628\u062F\u0627\u0644\u0644" +
即使使用 ISO-8859-1
编码,也会生成阿拉伯字形。或者,您可以使用 UTF-8
来获取正确的内容,而不管是否使用 Unicode 转义序列。
当您的编码问题得到解决后,您可能会得到如下输出:
为了正确呈现某些书写系统,iText 7 需要一个可选模块 pdfCalligraph。启用此模块后,结果输出如下所示:
用于上述测试的代码:
public static void main(String[] args) throws IOException {
// Needed for pdfCalligraph
LicenseKey.loadLicenseFile("all-products.xml");
File pdfFile = new File("java19.pdf");
OutputStream outputStream = new FileOutputStream(pdfFile);
String htmlSource = getContent();
ConverterProperties converterProperties = new ConverterProperties();
FontProvider dfp = new DefaultFontProvider(true, false, false);
dfp.addFont("/Library/Fonts/Arial.ttf");
converterProperties.setFontProvider(dfp);
converterProperties.setMediaDeviceDescription(new MediaDeviceDescription(MediaType.PRINT));
HtmlConverter.convertToPdf(htmlSource, outputStream, converterProperties);
}
private static String getContent() {
return "<!DOCTYPE html>\n" +
"<html lang=\"en\">\n" +
"\n" +
"<head>\n" +
" <meta charset=\"UTF-8\">\n" +
" <meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\">\n" +
" <meta http-equiv=\"X-UA-Compatible\" content=\"ie=edge\">\n" +
" <title>Document</title>\n" +
" <style>\n" +
" @page {\n" +
" margin: 0;\n" +
" font-family: arial;\n" +
" }\n" +
" </style>\n" +
"</head>\n" +
"\n" +
"<body\n" +
" style=\"margin: 0;padding: 0;font-family: arial, sans-serif;font-size: 14px;line-height: 125%;width: 100%;-ms-text-size-adjust: 100%;-webkit-text-size-adjust: 100%;color: #222222;\">\n" +
" <table cellpadding=\"0\" cellspacing=\"0\" width=\"100%\" style=\"background: white; direction: rtl;\">\n" +
" <tbody>\n" +
" <tr>\n" +
" <td style=\"padding: 0 35px;\">\n" +
// Arabic content
// " <p> انا اسمى عبدالله\n" +
// Arabic content with Unicode escape sequences
" <p>\u0627\u0646\u0627 \u0627\u0633\u0645\u0649 \u0639\u0628\u062F\u0627\u0644\u0644\u0647" +
" </p>\n" +
" </td>\n" +
" </tr>\n" +
" </tbody>\n" +
" </table>\n" +
"\n" +
"</body>\n" +
"\n" +
"</html>";
}
关于java - PDF iText 中的阿拉伯语翻译错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61896643/