java - Apache FOP。西里尔字符问题

标签 java pdf pdf-generation apache-fop

我使用 Apache FOP 库在 Java8 项目中生成一些 pdf 文件。英文内容显示没有任何问题,但俄语字符很奇怪。它们看起来像这样:Ð#Ð⁄гР̧н

这里的问题似乎与编码有关,但我该如何解决它?

这是我用来生成 pdf 的类:

public class PdfGenerationTools implements StreamResource.StreamSource
    {
    String content;

    public PdfGenerationTools(String content) {
        this.content = content;
    }

    @Override
    public InputStream getStream()
    {
        ByteArrayInputStream foStream =
                new ByteArrayInputStream(content.getBytes(StringTools.UTF8));

        // Basic FOP configuration. You could create this object
        // just once and keep it.
        FopFactory fopFactory = FopFactory.newInstance();
        fopFactory.setStrictValidation(false); // For an example

        // Configuration for this PDF document - mainly metadata
        FOUserAgent userAgent = getFOUserAgent(fopFactory);

        // Transform to PDF
        ByteArrayOutputStream fopOut = new ByteArrayOutputStream();
        try {
            Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF,
                    userAgent, fopOut);
            TransformerFactory factory =
                    TransformerFactory.newInstance();
            Transformer transformer = factory.newTransformer();
            Source src = new
                    javax.xml.transform.stream.StreamSource(foStream);
            Result res = new SAXResult(fop.getDefaultHandler());
            transformer.transform(src, res);
            fopOut.close();
            return new ByteArrayInputStream(fopOut.toByteArray());

        } catch (Exception e) {
            e.printStackTrace();
        }

        return null;
    }

    private FOUserAgent getFOUserAgent(FopFactory factory)
    {
        FOUserAgent userAgent = factory.newFOUserAgent();

        userAgent.setProducer("Company");
        userAgent.setCreationDate(new Date());
        userAgent.setTitle("Printing jobs");
        userAgent.setTargetResolution(300); // DPI

        return userAgent;
    }

    public static String initDoc()
    {
        return "<?xml version='1.0' encoding='ISO-8859-1'?>"+
                "<fo:root xmlns:fo='http://www.w3.org/1999/XSL/Format'>"+
                "<fo:layout-master-set>"+
                "<fo:simple-page-master master-name='A4' margin='2cm'>"+
                "<fo:region-body />"+
                "</fo:simple-page-master>"+
                "</fo:layout-master-set>"+
                "<fo:page-sequence master-reference='A4'>"+
                "<fo:flow flow-name='xsl-region-body'>";
    }

    public static String closeDoc()
    {
        return "</fo:flow>"+
                "</fo:page-sequence>"+
                "</fo:root>";
    }

    public static String initTable()
    {
        return "<fo:block space-before.optimum=\"10pt\"></fo:block>" +
                "<fo:table table-layout=\"fixed\" border-width=\"1mm\" border-style=\"solid\">" +
                "<fo:table-column column-number=\"1\" column-width=\"50%\"/>" +
                "<fo:table-column column-number=\"2\" column-width=\"50%\"/>" +
                "<fo:table-body>";
    }

    public static String closeTable()
    {
        return "</fo:table-body>" +
                "</fo:table>";
    }

    public static String initTableRow()
    {
        return "<fo:table-row keep-together.within-page=\"always\">";
    }

    public static String closeTableRow()
    {
        return  "</fo:table-row>";
    }

    public static String getCell(String ... args)
    {
        final StringBuilder sb = new StringBuilder();
        sb.append("<fo:table-cell padding=\"1mm\" border-width=\"1mm\" border-style=\"double\">");

        for (String arg : args)
        {
            sb.append("<fo:block font-family=\"SansSerif\">")
                    .append(arg)
                    .append("</fo:block>");
        }

        sb.append("</fo:table-cell>");

        return sb.toString();
    }
}

当我将编码从“ISO-8859-1”更改为“UTF-8”时,我的西里尔字母子字符串 看起来像这样:“#####”。看来我在这里缺少字体..

最佳答案

您必须使用 FOP 的配置文件,该文件指示要嵌入到 PDF 文档中的字体,例如:

<?xml version="1.0" encoding="UTF-8"?>
<fop version='1.0'>
    <renderers>
        <renderer mime='application/pdf'>
            <fonts>
                <!-- TTF fonts -->
                <font kerning='yes' embed-url='c:\windows\fonts\arial.ttf'>
                    <font-triplet name='Arial' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\arialbd.ttf'>
                    <font-triplet name='Arial' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\ariali.ttf'>
                    <font-triplet name='Arial' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\arialbi.ttf'>
                    <font-triplet name='Arial' style='italic' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\times.ttf'>
                    <font-triplet name='TimesNewRoman' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\timesbd.ttf'>
                    <font-triplet name='TimesNewRoman' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\timesi.ttf'>
                    <font-triplet name='TimesNewRoman' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\timesbi.ttf'>
                    <font-triplet name='TimesNewRoman' style='italic' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\cour.ttf'>
                    <font-triplet name='CourierNew' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\courbd.ttf'>
                    <font-triplet name='CourierNew' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\couri.ttf'>
                    <font-triplet name='CourierNew' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\courbi.ttf'>
                    <font-triplet name='CourierNew' style='italic' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdana.ttf'>
                    <font-triplet name='Verdana' style='normal' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdanab.ttf'>
                    <font-triplet name='Verdana' style='normal' weight='bold' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdanai.ttf'>
                    <font-triplet name='Verdana' style='italic' weight='normal' />
                </font>
                <font kerning='yes' embed-url='c:\windows\fonts\verdanaz.ttf'>
                    <font-triplet name='Verdana' style='italic' weight='bold' />
                </font>
            </fonts>
        </renderer>
    </renderers>
</fop>

使用方法:

// configure fopFactory as desired
FopFactory fopFactory = FopFactory.newInstance();
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
fopFactory.setUserConfig(new File("fop.xml"));

关于java - Apache FOP。西里尔字符问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38137490/

相关文章:

java - ClassCastException 使用 Class.cast 使用泛型

excel - 从excel链接到打开pdf文件

java - 使用Intent打开本地PDF文件

java - 将 HTML 转换为 PDF 并将其添加到段落中

最后打印页面的 HTML 页脚

Java Ebean 将列表保存到数据库

java - HashTable getcontainsKey 不起作用

java - 2 个扫描仪但有一个 onActivityResult?

perl - 如何在 Perl 中获取 PDF 文件的页数?

ruby - PDF 坐标如何工作?