amazon-web-services - Nginx 代理 Amazon S3 资源

标签 amazon-web-services nginx amazon-s3 proxy http-headers

我正在执行一些 WPO 任务,因此 PageSpeed 建议我利用浏览器缓存。我已经成功地改进了 Nginx 服务器中的一些静态文件,但是存储在 Amazon S3 服务器中的图像文件仍然丢失。

我已经阅读了有关更新 S3 中每个文件以包含一些 header 元标记(过期和缓存控制)的方法。我认为这不是一个好方法。我有数千个文件,所以这对我来说不可行。

我认为最方便的方法是配置我的 Nginx 1.6.0 服务器来代理 S3 文件。我已经读过这篇文章,但我对服务器配置一点也不熟练,所以我从这些网站上得到了几个示例:https://gist.github.com/benjaminbarbe/1961db5ffbaad57eff12

我在 nginx 配置文件的服务器 block 中添加了此位置代码:

#inside server block
location /mybucket.s3.amazonaws.com/ {


        proxy_http_version     1.1;
        proxy_set_header       Host mybucket.s3.amazonaws.com;
        proxy_set_header       Authorization '';
        proxy_hide_header      x-amz-id-2;
        proxy_hide_header      x-amz-request-id;
        proxy_hide_header      Set-Cookie;
        proxy_ignore_headers   "Set-Cookie";
        proxy_buffering        off;
        proxy_intercept_errors on;      
        proxy_pass             http://mybucket.s3.amazonaws.com;
      }

当然,这对我不起作用。我的请求中不包含 header 。因此,首先我认为请求与位置不匹配。

Accept-Ranges:bytes
Content-Length:90810
Content-Type:image/jpeg
Date:Fri, 23 Jun 2017 04:53:56 GMT
ETag:"4fd0be549fbcaf9b47c18a15146cdf16"
Last-Modified:Tue, 09 Jun 2015 09:47:13 GMT
Server:AmazonS3
x-amz-id-2:cKsq1qRra74DqVsTewh3P3sgzVUoPR8aAT2NFCuwA+JjCdDZfk7/7x/C0WPjBa51GEb4C8LyAIc=
x-amz-request-id:94EADB4EDD3DE1C1

最佳答案

您通过 Nginx 代理 S3 文件的方法很有意义。它解决了许多问题,并带来了额外的好处,例如屏蔽 URL、代理缓存、通过卸载 SSL/TLS 加速传输。你做得几乎是正确的,让我展示一下剩下的部分以使其完美。

For sample queries I use the S3 bucket and an image URL mentioned in the public comment to the original question.

我们首先检查 Amazon S3 文件的 header

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg

HTTP/1.1 200 OK
Date: Sun, 25 Jun 2017 17:49:10 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 378843
Server: AmazonS3

我们可以看到缺少 Cache-Control,但条件 GET header 已配置。当我们重用 E-Tag/Last-Modified(这就是浏览器的客户端缓存的工作方式)时,我们会得到 HTTP 304 以及空的 Content-Length。对此的解释是客户端(在我们的例子中是curl)查询资源,表示不需要数据传输,除非文件已在服务器上修改:

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"

HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 17:53:33 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

curl -I http://yanpy.dev.s3.amazonaws.com/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-Modified-Since: Wed, 21 Jun 2017 07:42:31 GMT"

HTTP/1.1 304 Not Modified
Date: Sun, 25 Jun 2017 18:17:34 GMT
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Server: AmazonS3

"PageSpeed suggested to leverage browser caching" that means Cache=control is missing. Nginx as proxy for S3 files solves not only problem with missing headers but also saves traffic using Nginx proxy cache.

我使用 macOS,但 Nginx 配置在 Linux 上的工作方式完全相同,无需修改。一步一步:

1.安装Nginx

brew update && brew install nginx

2.设置Nginx代理S3存储桶,请参见下面的配置

3.通过Nginx请求文件。请看一下 Server header ,我们现在看到的是 Nginx 而不是 Amazon S3:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg

HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:30:26 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Accept-Ranges: bytes
Cache-Control: max-age=31536000

Request the file via Nginx

4.使用带有条件 GET 的 Nginx 代理请求文件:

curl -I http://localhost:8080/s3/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg 
--header "If-None-Match: 37a907fc5dd7cfd0c428af78f09e95a9"

HTTP/1.1 304 Not Modified
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:32:16 GMT
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000

Request the file using Nginx proxy with Conditional GET

5.使用Nginx代理缓存请求文件,请查看X-Cache-Status header ,其值为MISS,直到第一次请求后缓存预热

curl -I http://localhost:8080/s3_cached/img/blog/sailing-routes-around-croatia-central-dalmatia-islands/yachts-anchored-paradise-cove-croatia-3.jpg
HTTP/1.1 200 OK
Server: nginx/1.12.0
Date: Sun, 25 Jun 2017 18:40:45 GMT
Content-Type: binary/octet-stream
Content-Length: 378843
Connection: keep-alive
Last-Modified: Wed, 21 Jun 2017 07:42:31 GMT
ETag: "37a907fc5dd7cfd0c428af78f09e95a9"
Expires: Fri, 21 Jul 2018 07:41:49 UTC
Cache-Control: max-age=31536000
X-Cache-Status: HIT
Accept-Ranges: bytes

Request the file using Nginx proxy cache

基于Nginx official documentation我为 Nginx S3 配置提供了优化的缓存设置,支持以下选项:

  • proxy_cache_revalidate 指示 NGINX 使用条件 GET 从源服务器刷新内容时的请求
  • proxy_cache_use_stale指令的更新参数指示NGINX在客户端请求项目时提供过时的内容 当从原始服务器下载更新时, 而不是将重复的请求转发到服务器
  • 启用proxy_cache_lock后,如果多个客户端请求缓存中不存在的文件(MISS),则仅请求其中的第一个 允许请求到达原始服务器

Nginx 配置:

worker_processes  1;
daemon off;

error_log  /dev/stdout info;
pid        /usr/local/var/nginx/nginx.pid;


events {
  worker_connections  1024;
}


http {
  default_type       text/html;
  access_log         /dev/stdout;
  sendfile           on;
  keepalive_timeout  65;

  proxy_cache_path   /tmp/ levels=1:2 keys_zone=s3_cache:10m max_size=500m
                     inactive=60m use_temp_path=off;

  server {
    listen 8080;

    location /s3/ {
      proxy_http_version     1.1;
      proxy_set_header       Connection "";
      proxy_set_header       Authorization '';
      proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
      proxy_hide_header      x-amz-id-2;
      proxy_hide_header      x-amz-request-id;
      proxy_hide_header      x-amz-meta-server-side-encryption;
      proxy_hide_header      x-amz-server-side-encryption;
      proxy_hide_header      Set-Cookie;
      proxy_ignore_headers   Set-Cookie;
      proxy_intercept_errors on;
      add_header             Cache-Control max-age=31536000;
      proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
    }

    location /s3_cached/ {
      proxy_cache            s3_cache;
      proxy_http_version     1.1;
      proxy_set_header       Connection "";
      proxy_set_header       Authorization '';
      proxy_set_header       Host yanpy.dev.s3.amazonaws.com;
      proxy_hide_header      x-amz-id-2;
      proxy_hide_header      x-amz-request-id;
      proxy_hide_header      x-amz-meta-server-side-encryption;
      proxy_hide_header      x-amz-server-side-encryption;
      proxy_hide_header      Set-Cookie;
      proxy_ignore_headers   Set-Cookie;
      proxy_cache_revalidate on;
      proxy_intercept_errors on;
      proxy_cache_use_stale  error timeout updating http_500 http_502 http_503 http_504;
      proxy_cache_lock       on;
      proxy_cache_valid      200 304 60m;
      add_header             Cache-Control max-age=31536000;
      add_header             X-Cache-Status $upstream_cache_status;
      proxy_pass             http://yanpy.dev.s3.amazonaws.com/;
    }

  }
}

关于amazon-web-services - Nginx 代理 Amazon S3 资源,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44639182/

相关文章:

python - 如何使用 boto3 删除 AWS 存储桶中的文件夹及其内容

java - AWS Lambda 尝试列出 DynamoDb 表时出错

python - 使用 boto,在 s3 上已经存在的文件上设置 content_type

python - 在 AWS lambda 上将 ImageMagick 作为 python 子进程运行

python - Gunicorn 未在 AWS EC2 上创建 .sock 文件

amazon-web-services - 使用 ACL 'Access Denied' 保存到 s3 时,具有 AdministratorAccess 的 AWS Lambda 抛出 'public-read'

ssl - 通过 Nginx Ingress Controller 和证书管理器启用 SSL 后,TTFB 增加了 200 多毫秒

python - flask-socketio wss中的握手响应为空

java - AWS Lambda RequestHandler 用于无效输出

mysql - 用于扩展的 EBS 上的 MySQL 架构(Amazon Web Services)