pdf - 使用 Ghostscript 将 pdf 页面的整个媒体框渲染为 png 文件

标签 pdf png ghostscript

我正在尝试使用 Ghostscript v9.02 将 Pdfs 页面渲染为 png 文件。为此,我使用以下命令行:

gswin32c.exe -sDEVICE=png16m -o outputFile%d.png mypdf.pdf

当 pdf 裁剪框与媒体框相同时,此功能工作正常,但如果裁剪框小于媒体框,则仅显示媒体框,并且 pdf 页面的边框会丢失。
我知道通常 pdf 查看器仅显示裁剪框,但我需要能够在 png 文件中查看整个媒体页面。

Ghostscript 文档说默认情况下会呈现文档的媒体框,但这在我的情况下不起作用。 任何人都知道如何使用 Ghostscript 实现渲染整个媒体框?
是否对于 png 文件设备,仅渲染裁剪框?我可能忘记了特定的命令吗?

例如,this pdf包含裁剪框外部的一些注册标记,这些注册标记不存在于输出 png 文件中。有关此 pdf 的更多信息:

  • 媒体盒:
    • 宽度:667
    • 高度:908 分
  • 裁剪框:
    • 宽度:640
    • 高度:851

最佳答案

好吧,现在 revers 已经将他的问题重新表述为他正在寻找“通用代码”,让我再试一次。

“通用代码”的问题在于,PDF 中可能出现许多“CropBox”语句的“合法”形式表示。以下所有选项都是可能且正确的,并且为页面的 CropBox 设置相同的值:

  • /CropBox[10 20 500 700]

  • /CropBox[ 10 20 500 700 ]

  • /CropBox[10 20 500 700]

  • /CropBox [10 20 500 700]

  • /CropBox [ 10 20 500 700 ]

  • /CropBox [ 10.00 20.0000 500.0 700 ]

  • /CropBox [    
              10    
              20    
              500    
              700    
             ] 

The same is true for ArtBox, TrimBox, BleedBox, CropBox and MediaBox. Therefor you need to "normalize" the *Box representation inside the PDF source code if you want to edit it.

First Step: "Normalize" the PDF source code

Here is how you do that:

  1. Download qpdf for your OS platform.
  2. Run this command on your input PDF:
    qpdf --qdf input.pdf output.pdf

The output.pdf now will have a kind of normalized structure (similar to the last example given above), and it will be easier to edit, even with a stream editor like sed.

Second Step: Remove all superfluous *Box statements

Next, you need to know that the only essential *Box is MediaBox. This one MUST be present, the others are optional (in a certain prioritized way). If the others are missing, they default to the same values as MediaBox. Therefor, in order to achieve your goal, we can simply delete all code that is related to them. We'll do it with the help of sed.

That tool is normally installed on all Linux systems -- on Windows download and install it from gnuwin32.sf.net. (Don't forget to install the named "dependencies" should you decide to use the .zip file instead of the Setup .exe).

Now run this command:

  1. sed.exe -i.bak -e "/CropBox/,/]/s#.# #g" output.pdf

Here is what this command is supposed to do:

  • -i.bak tells sed to edit the original file inline, but to also create a backup file with a.bak suffix (in case something goes wrong).
  • /CropBox/ states the first address line to be processed by sed.
  • /]/ states the last address line to be processed by sed.
  • s tells sed to do substitutions for all lines from first to last addressed line.
  • #.# #g tells sed which kind of substitution to do: replace each arbitrary character ('.') in the address space by blanks (''), globally ('g').

We substitute all characters by blanks (instead of by 'nothing', i.e. deleting them) because otherwise we'd get complaints about "PDF file corruption", since the object reference counting and the stream lengths would have changed.

Third step: run your Ghostscript command

You know that already well enough:

gswin32c.exe -sDEVICE=png16m -o outputImage_%03d.png output.pdf

上面的所有三个步骤都可以轻松编写脚本,我将把它留给您,您可以随意编写。

关于pdf - 使用 Ghostscript 将 pdf 页面的整个媒体框渲染为 png 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6451859/

相关文章:

javascript - 如何使用PDF.JS显示整个PDF(不仅仅是一页)?

javascript - 我可以用数字动态更改图像然后将其打印为 PDF 吗

javascript - 通过导航到 codeigniter 中的某个位置来下载创建的 pdf

python - 如何将使用 Pillow 修改的 .png 图像添加到 OpenCV 视频中?

windows - 使用 Phantom JS 将文件夹中的所有 HTML 文件转换为 PNG

ios - 如果不在 Retina 显示屏上,选定的标签栏图标看起来很模糊

java - 让 ghostscript 接收名称中带有空格的文件(比如 "my documents"中的东西)

pdf - 将 pandoc 设置为小于 10pt 的 PDF 字体大小

将 GhostPDL 编译为 DLL

linux - cups pdf 打印的自定义参数