python - PDF 出血检测

标签 python pdf typography pypdf

我目前正在编写一个小工具(Python + pyPdf)来测试 PDF 的打印机一致性。

唉,我已经对第一项任务感到困惑:检测 PDF 是否有至少 3 毫米的“出血”(页面周围没有打印任何内容的边框)。我已经知道我无法检测到完整文档的出血,因为似乎没有全局性的出血。然而,在页面上我总共可以检测到五个不同的框:

  • 媒体框
  • 流血框
  • trimBox
  • 裁剪框
  • 艺术盒

我读了pyPdf documentation关于那些框,但我唯一理解的是 mediaBox,它似乎代表了整个页面的大小(即纸张)。

bleedBox 很明显应该 定义出血,但情况似乎并非总是如此。

我注意到的另一件事是,例如 PDF ,所有这些框在每一页上都具有完全相同的大小(意味着根本没有出血),但是当我打开它时,有大量的出血;这让我认为各个文本元素都有自己的偏移量。

因此,显然,仅计算 mediaBoxbleedBox 的出血量并不是一个可行的选择。

如果有人能阐明这些盒子的实际含义以及我可以从中得出什么结论(例如,一个盒子总是比另一个小),我将非常高兴。

奖励问题:谁能告诉我 documentation 中提到的 “默认用户空间单元” 到底是什么? ?我很确定这指的是我机器上的 mm,但我想在所有地方强制执行 mm

最佳答案

引自 PDF 规范 ISO 32000-1:2008由 Adob​​e 发布:

14.11.2 Page Boundaries

14.11.2.1 General

A PDF page may be prepared either for a finished medium, such as a sheet of paper, or as part of a prepress process in which the content of the page is placed on an intermediate medium, such as film or an imposed reproduction plate. In the latter case, it is important to distinguish between the intermediate page and the finished page. The intermediate page may often include additional production-related content, such as bleeds or printer marks, that falls outside the boundaries of the finished page. To handle such cases, a PDF page maydefine as many as five separate boundaries to control various aspects of the imaging process:

  • The media box defines the boundaries of the physical medium on which the page is to be printed. It may include any extended area surrounding the finished page for bleed, printing marks, or other such purposes. It may also include areas close to the edges of the medium that cannot be marked because of physical limitations of the output device. Content falling outside this boundary may safely be discarded without affecting the meaning of the PDF file.

  • The crop box defines the region to which the contents of the page shall be clipped (cropped) when displayed or printed. Unlike the other boxes, the crop box has no defined meaning in terms of physical page geometry or intended use; it merely imposes clipping on the page contents. However, in the absence of additional information (such as imposition instructions specified in a JDF or PJTF job ticket), the crop box determines how the page’s contents shall be positioned on the output medium. The default value is the page’s media box.

  • The bleed box (PDF 1.3) defines the region to which the contents of the page shall be clipped when output in a production environment. This may include any extra bleed area needed to accommodate the physical limitations of cutting, folding, and trimming equipment. The actual printed page may include printing marks that fall outside the bleed box. The default value is the page’s crop box.

  • The trim box (PDF 1.3) defines the intended dimensions of the finished page after trimming. It may be smaller than the media box to allow for production-related content, such as printing instructions, cut marks, or colour bars. The default value is the page’s crop box.

  • The art box (PDF 1.3) defines the extent of the page’s meaningful content (including potential white space) as intended by the page’s creator. The default value is the page’s crop box.

The page object dictionary specifies these boundaries in the MediaBox, CropBox, BleedBox, TrimBox, and ArtBox entries, respectively (see Table 30). All of them are rectangles expressed in default user space units. The crop, bleed, trim, and art boxes shall not ordinarily extend beyond the boundaries of the media box. If they do, they are effectively reduced to their intersection with the media box. Figure 86 illustrates the relationships among these boundaries. (The crop box is not shown in the figure because it has no defined relationship with any of the other boundaries.)

下面是一个很好的图形,显示了这些框之间的相互关系:

PDF boxes illustrated

很多情况下只设置媒体框的原因是

  1. 对于用于电子消费(即在计算机上阅读)的 PDF,其他框几乎无关紧要;和

  2. 即使在印前环境中,它们也不再像以前那样必要,参见。 article佩德罗在他的评论中提到。

关于你的“红利问题”:用户空间单位默认为1⁄72英寸;但是,自 PDF 1.6 起,可以使用页面字典中的 UserUnit 条目将其更改为该大小的任何(不一定是整数)倍数。在现有 PDF 中更改它本质上是缩放它,因为用户空间单位是页面的设备独立坐标系中的基本单位。因此,除非您想更新页面描述中引用坐标的每条命令以保持页面尺寸,否则您不会想要强制使用毫米用户空间单位...;)

关于python - PDF 出血检测,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13236370/

相关文章:

html - 如何在句子末尾显示两个空格(redux)

python - 使用 Python 中的 xml 子对象、ElementTree

python - 如何使用 Python 提取与内容相关的所有 PDF 标签?

java - servlet 创建的 Pdf 文件未正确加载

python - 解析文本以替换引号和嵌套引号

css - 没有变音符号的 Webfont

python - 关于 C 溢出,如何在 Python 中使用 64 位无符号整数数学?

python - 将 2 个值传递给验证器

python - 具有DNN的OpenCv

php - 无法使用 for 循环使用 dompdf 渲染两个 pdf