java - 文本位置边界框 PDFBox

标签 java pdf pdf-generation pdfbox

我正在尝试从 TextPosition 绘制相应的字形边界框,如 PDF 32000 文档中所示。 enter image description here

这是我的函数,它执行从字形空间用户空间的计算

@Override 
protected void processTextPosition(TextPosition text) {
    PDFont font = pos.getFont();
    
    BoundingBox bbox = font.getBoundingBox();
    
    Rectangle2D.Float rect = new Rectangle2D.Float(bbox.getLowerLeftX(), bbox.getUpperRightY(), 
            bbox.getWidth(), bbox.getHeight());
    
    AffineTransform at = pos.getTextMatrix().createAffineTransform();
    
    if (font instanceof PDType3Font) {
        at.concatenate(font.getFontMatrix().createAffineTransform());
    } else {
        at.scale(1 / 1000f, 1 / 1000f);
    }
    Shape shape = at.createTransformedShape(rect);
    rectangles.add(fillBBox(text));
    
            
    super.processTextPosition(text);
}

这是绘制提取的矩形的函数:

private void drawBoundingBoxes() throws IOException {
    
    String fileNameOut = path.substring(0, path.lastIndexOf(".")) + "_OUT.pdf";
    log.info("Drawing Bounding Boxes for TextPositions");
    
    PDPageContentStream contentStream = new PDPageContentStream(document, 
            document.getPage(document.getNumberOfPages()-1),
            PDPageContentStream.AppendMode.APPEND, false , true );
    contentStream.setLineWidth(1f);
    contentStream.setStrokingColor(Color.RED);
    
    try{
        for (Shape p : rectangles) {
            p = all.get(0);
        double[] coords = new double[6];
        GeneralPath g = new GeneralPath(p.getBounds2D());
        for (PathIterator pi = g.getPathIterator(null);
             !pi.isDone();
             pi.next()) {
            System.out.println(Arrays.toString(coords));
            switch (pi.currentSegment(coords)) {
            case PathIterator.SEG_MOVETO:
                System.out.println("move to");
                contentStream.moveTo ((float)coords[0], (float) coords[1]);
                break;
                
            case PathIterator.SEG_LINETO:
                System.out.println("line to");
                contentStream.lineTo ((float)coords[0], (float) coords[1]);
                break;
                
            case PathIterator.SEG_CUBICTO:
                System.out.println("cubc to");
                contentStream.curveTo((float)coords[0], (float) coords[1],
                        (float)coords[2], (float) coords[3], 
                        (float)coords[4],(float) coords[5]);
                break;
                
            case PathIterator.SEG_CLOSE:
                System.out.println("close");
                contentStream.closeAndStroke();
                break;
            default:
                System.out.println("no shatt");
                break;
            }
            
        }
    
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        contentStream.close();
        document.save(new File(fileNameOut));
    }
}

然后,当我尝试在 pdf 上绘图时,我得到第一个字母(大写 V)的以下结果 enter image description here

我不知道我做错了什么。有什么想法吗?

最佳答案

先生。 D、

我测试了你的代码,让它工作所需的唯一改变就是反转 Y 轴。需要这样做的原因是 PDF 用户空间 中的原点位于左下角,而 Java 2D 用户空间 的原点位于左上角[1]

8.3.2.3 User Space

The user space coordinate system shall be initialized to a default state for each page of a document. The CropBox entry in the page dictionary shall specify the rectangle of user space corresponding to the visible area of the intended output medium (display window or printed page). The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the Rotate entry in the page dictionary). The length of a unit along both the x and y axes is set by the UserUnit entry (PDF 1.6) in the page dictionary (see Table 30). If that entry is not present or supported, the default value of 1⁄72 inch is used. This coordinate system is called default user space.[2]

源代码

@Override 
protected void processTextPosition(TextPosition text) {
    try {
        PDFont font = pos.getFont();

        BoundingBox bbox = font.getBoundingBox();

        Rectangle2D.Float rect = new Rectangle2D.Float(bbox.getLowerLeftX(), bbox.getUpperRightY(),
                    bbox.getWidth(), bbox.getHeight());

        AffineTransform at = pos.getTextMatrix().createAffineTransform();

        if (font instanceof PDType3Font) {
            at.concatenate(font.getFontMatrix().createAffineTransform());
        } else {
            at.scale(1 / 1000f, 1 / 1000f);
        }

        Shape shape = at.createTransformedShape(rect);

        // Invert Y axis
        Rectangle2D bounds = shape.getBounds2D();
        bounds.setRect(bounds.getX(), bounds.getY() - bounds.getHeight(), bounds.getWidth(), bounds.getHeight());

        rectangles.add(bounds);

        super.processTextPosition(text);

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

引用文献

  1. Java 2D API Concepts: Coordinates

  2. 文档管理 - 可移植文档格式 - 第 1 部分:PDF 1.7,PDF 32000-1:2008 ,第 8.3 节:坐标系,第 115 页

关于java - 文本位置边界框 PDFBox,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54040872/

相关文章:

php - DomPDF 中的整页 3 列布局

php - 使用 tcpdf 编写更快的 pdf

pdf - 使用 ColdFusion cfpdfform 在新浏览器窗口中打开 PDF

c# - 打开 PDF 文档并为其添加书签

c# - 使用 iTextSharp 将表单元素添加到表格中

java - 附加控件+字符串java中的一个字符

java - 调用我在不同类中创建的方法,它说它不存在

java - 在java中将整数数组转换为对象数组

java - 如何将一个int分成两个char

javascript - node.js 中的本地 PDF 文件抓取