ruby-on-rails - 使用 RMagick 从 https 读取 pdf 会出现未经授权的错误

标签 ruby-on-rails pdf https rmagick

我正在尝试阅读 pdf 并将第一页保存为图像。此方法适用于 http,但不适用于 https。

require 'RMagick'

url = "http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf"
image = Magick::Image.read(url + "[0]")
=> [http://www.ke.tu-darmstadt.de/publications/reports/tud-ke-2008-07.pdf[0]=>tud-ke-2008-07.pdf PDF 595x842 595x842+0+0 DirectClass 16-bit 27kb]

url = "https://www.cs.purdue.edu/homes/dgleich/publications/Gleich%202003%20-%20Machine%20Learning%20in%20Computer%20Chess.pdf"
image = Magick::Image.read(url + "[0]")
Magick::ImageMagickError: not authorized `//www.cs.purdue.edu/homes/dgleich/publications/Gleich%202003%20-%20Machine%20Learning%20in%20Computer%20Chess.pdf' @ error/constitute.c/ReadImage/454

policy.xml 文件在未经编辑的情况下看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE policymap [
<!ELEMENT policymap (policy)+>
<!ELEMENT policy (#PCDATA)>
<!ATTLIST policy domain (delegate|coder|filter|path|resource) #IMPLIED>
<!ATTLIST policy name CDATA #IMPLIED>
<!ATTLIST policy rights CDATA #IMPLIED>
<!ATTLIST policy pattern CDATA #IMPLIED>
<!ATTLIST policy value CDATA #IMPLIED>
]>
<!--
  Configure ImageMagick policies.

  Domains include system, delegate, coder, filter, path, or resource.

  Rights include none, read, write, and execute.  Use | to combine them,
  for example: "read | write" to permit read from, or write to, a path.

  Use a glob expression as a pattern.

  Suppose we do not want users to process MPEG video images:

    <policy domain="delegate" rights="none" pattern="mpeg:decode" />

  Here we do not want users reading images from HTTP:

    <policy domain="coder" rights="none" pattern="HTTP" />

  Lets prevent users from executing any image filters:

    <policy domain="filter" rights="none" pattern="*" />

  The /repository file system is restricted to read only.  We use a glob
  expression to match all paths that start with /repository:

    <policy domain="path" rights="read" pattern="/repository/*" />

  Any large image is cached to disk rather than memory:

  Define arguments for the memory, map, area, and disk resources with
  SI prefixes (.e.g 100MB).  In addition, resource policies are maximums for
  each instance of ImageMagick (e.g. policy memory limit 1GB, -limit 2GB
  exceeds policy maximum so memory limit is 1GB).
-->
<policymap>
  <!-- <policy domain="system" name="precision" value="6"/> -->
  <!-- <policy domain="resource" name="temporary-path" value="/tmp"/> -->
  <!-- <policy domain="resource" name="memory" value="2GiB"/> -->
  <!-- <policy domain="resource" name="map" value="4GiB"/> -->
  <!-- <policy domain="resource" name="area" value="1GB"/> -->
  <!-- <policy domain="resource" name="disk" value="16EB"/> -->
  <!-- <policy domain="resource" name="file" value="768"/> -->
  <!-- <policy domain="resource" name="thread" value="4"/> -->
  <!-- <policy domain="resource" name="throttle" value="0"/> -->
  <!-- <policy domain="resource" name="time" value="3600"/> -->
  <policy domain="coder" rights="none" pattern="EPHEMERAL" />
  <policy domain="coder" rights="none" pattern="URL" />
  <policy domain="coder" rights="none" pattern="HTTPS" />
  <policy domain="coder" rights="none" pattern="MVG" />
  <policy domain="coder" rights="none" pattern="MSL" />
  <policy domain="coder" rights="none" pattern="TEXT" />
  <policy domain="coder" rights="none" pattern="SHOW" />
  <policy domain="coder" rights="none" pattern="WIN" />
  <policy domain="coder" rights="none" pattern="PLT" />
  <policy domain="path" rights="none" pattern="@*" />
</policymap>

最佳答案

听起来您的 imagemagick 策略文件不允许访问 https。这是通过如下指令完成的:

<policy domain="coder" rights="none" pattern="HTTPS" />

这是最近一轮 imagemagick security vulnerabilities 后推荐的 policy.xml 的一部分.

您当然可以编辑policy.xml来删除它(我不知道如果文件完全丢失,imagemagick是否会提示),但是如果您的托管提供商,这可能会让您面临这些漏洞依靠这些动机

另一个选择是下载文件,然后要求 Rmagick 读取该本地文件 - 该策略仅限制 ImageMagick 本身进行 https 访问。

关于ruby-on-rails - 使用 RMagick 从 https 读取 pdf 会出现未经授权的错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37928230/

相关文章:

ruby-on-rails - 如何优化 Rails 中的多态新闻提要?

ruby-on-rails - 如何使用 Controller A 访问表 B(在同一数据库中)?

mysql - 将 PDF 文件保存在数据库或文件系统中

ruby-on-rails - 将部分内容放入 Application Helper Function Rails 中的 content_tag 中

javascript - Coffeescript 编译变体(coffeescript.org 与 Coffee-rails)

node.js - 将邮件附件转换为文件

c# - iTextSharp ShowTextAligned anchor

ios - 如何将 fiddler 生成的证书导入 iOS 设备

api - PayPal MassPay 和 AddressVerify : Which SDK do I use?

ruby-on-rails - Rails.application.initialize 后 Rails ssl 证书验证失败