我正在尝试使用 Ghostscript v9.02 将 Pdfs 页面渲染为 png 文件。为此,我使用以下命令行:
gswin32c.exe -sDEVICE=png16m -o outputFile%d.png mypdf.pdf
当 pdf 裁剪框与媒体框相同时,此功能工作正常,但如果裁剪框小于媒体框,则仅显示媒体框,并且 pdf 页面的边框会丢失。
我知道通常 pdf 查看器仅显示裁剪框,但我需要能够在 png 文件中查看整个媒体页面。
Ghostscript 文档说默认情况下会呈现文档的媒体框,但这在我的情况下不起作用。
任何人都知道如何使用 Ghostscript 实现渲染整个媒体框?
是否对于 png 文件设备,仅渲染裁剪框?我可能忘记了特定的命令吗?
例如,this pdf包含裁剪框外部的一些注册标记,这些注册标记不存在于输出 png 文件中。有关此 pdf 的更多信息:
- 媒体盒:
- 宽度:667
- 高度:908 分
- 裁剪框:
- 宽度:640
- 高度:851
最佳答案
好吧,现在 revers 已经将他的问题重新表述为他正在寻找“通用代码”,让我再试一次。
“通用代码”的问题在于,PDF 中可能出现许多“CropBox”语句的“合法”形式表示。以下所有选项都是可能且正确的,并且为页面的 CropBox 设置相同的值:
/CropBox[10 20 500 700]
/CropBox[ 10 20 500 700 ]
/CropBox[10 20 500 700]
/CropBox [10 20 500 700]
/CropBox [ 10 20 500 700 ]
/CropBox [ 10.00 20.0000 500.0 700 ]
-
/CropBox [ 10 20 500 700 ]
The same is true for ArtBox
, TrimBox
, BleedBox
, CropBox
and MediaBox
. Therefor you need to "normalize" the *Box representation inside the PDF source code if you want to edit it.
First Step: "Normalize" the PDF source code
Here is how you do that:
- Download
qpdf
for your OS platform. - Run this command on your input PDF:
qpdf --qdf input.pdf output.pdf
The output.pdf
now will have a kind of normalized structure (similar to the last example given above), and it will be easier to edit, even with a stream editor like sed
.
Second Step: Remove all superfluous *Box statements
Next, you need to know that the only essential *Box is MediaBox
. This one MUST be present, the others are optional (in a certain prioritized way). If the others are missing, they default to the same values as MediaBox
. Therefor, in order to achieve your goal, we can simply delete all code that is related to them. We'll do it with the help of sed
.
That tool is normally installed on all Linux systems -- on Windows download and install it from gnuwin32.sf.net. (Don't forget to install the named "dependencies" should you decide to use the .zip file instead of the Setup .exe).
Now run this command:
sed.exe -i.bak -e "/CropBox/,/]/s#.# #g" output.pdf
Here is what this command is supposed to do:
-i.bak
tells sed to edit the original file inline, but to also create a backup file with a.bak
suffix (in case something goes wrong)./CropBox/
states the first address line to be processed by sed./]/
states the last address line to be processed by sed.s
tells sed to do substitutions for all lines from first to last addressed line.#.# #g
tells sed which kind of substitution to do: replace each arbitrary character ('.
') in the address space by blanks (''), globally ('
g
').
We substitute all characters by blanks (instead of by 'nothing', i.e. deleting them) because otherwise we'd get complaints about "PDF file corruption", since the object reference counting and the stream lengths would have changed.
Third step: run your Ghostscript command
You know that already well enough:
gswin32c.exe -sDEVICE=png16m -o outputImage_%03d.png output.pdf
上面的所有三个步骤都可以轻松编写脚本,我将把它留给您,您可以随意编写。
关于pdf - 使用 Ghostscript 将 pdf 页面的整个媒体框渲染为 png 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6451859/