file - 将近 40 万张图像传输到 S3 的最有效方法

标签 file amazon-web-services amazon-s3 transfer

我目前负责将一个站点从其当前服务器转移到 EC2,该项目的一部分已经完成并且很好,另一部分是我正在努力的部分,该站点目前有近 400K 图像,都在不同的文件夹中排序在主 userimg 文件夹中,客户端希望所有这些图像都存储在 S3 上 - 我遇到的主要问题是如何将近 400,000 张图像从服务器传输到 S3 - 我一直在使用 http://s3tools.org/s3cmd这很棒,但是如果我要使用 s3cmd 传输 userimg 文件夹,它将需要近 3 天的时间,如果连接中断或出现类似问题,我将在 s3 上有一些图像,有些则没有,无法继续过程……

任何人都可以提出解决方案,以前有没有人遇到过这样的问题?

最佳答案

我建议您编写(或让某人编写)一个简单的 Java 实用程序,它:

  • 读取客户端目录的结构(如果需要)
  • 对于每个图像,在 s3 上创建相应的 key (根据 1 中读取的文件结构)并使用 AWS SDK 或 jets3t API 并行启动分段上传。

  • 我为我们的客户做的。不到200行java代码,非常可靠。
    下面是进行分段上传的部分。读取文件结构的部分是微不足道的。

    /**
     * Uploads file to Amazon S3. Creates the specified bucket if it does not exist.
     * The upload is done in chunks of CHUNK_SIZE size (multi-part upload).
     * Attempts to handle upload exceptions gracefully up to MAX_RETRY times per single chunk.
     * 
     * @param accessKey     - Amazon account access key
     * @param secretKey     - Amazon account secret key
     * @param directoryName - directory path where the file resides
     * @param keyName       - the name of the file to upload
     * @param bucketName    - the name of the bucket to upload to
     * @throws Exception    - in case that something goes wrong
     */
    public void uploadFileToS3(String accessKey
            ,String secretKey
            ,String directoryName
            ,String keyName // that is the file name that will be created after upload completed
            ,String bucketName ) throws Exception {
    
        // Create a credentials object and service to access S3 account
        AWSCredentials myCredentials =
            new BasicAWSCredentials(accessKey, secretKey);
    
        String filePath = directoryName
        + System.getProperty("file.separator")
        + keyName;   
    
        log.info("uploadFileToS3 is about to upload file [" + filePath + "]");
    
        AmazonS3 s3Client = new AmazonS3Client(myCredentials);        
        // Create a list of UploadPartResponse objects. You get one of these
        // for each part upload.
        List<PartETag> partETags = new ArrayList<PartETag>();
    
        // make sure that the bucket exists
        createBucketIfNotExists(bucketName, accessKey, secretKey);
    
        // delete the file from bucket if it already exists there
        s3Client.deleteObject(bucketName, keyName);
    
        // Initialize.
        InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, keyName);
        InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest);
    
        File file = new File(filePath);
    
        long contentLength = file.length();
        long partSize = CHUNK_SIZE; // Set part size to 5 MB.
        int numOfParts = 1;
        if (contentLength > CHUNK_SIZE) {
            if (contentLength % CHUNK_SIZE != 0) {
                numOfParts = (int)((contentLength/partSize)+1.0);
            }
            else {
                numOfParts = (int)((contentLength/partSize));
            }
        }
    
        try {
            // Step 2: Upload parts.
            long filePosition = 0;
            for (int i = 1; filePosition < contentLength; i++) {
                // Last part can be less than 5 MB. Adjust part size.
                partSize = Math.min(partSize, (contentLength - filePosition));
    
                log.info("Start uploading part[" + i + "] of [" + numOfParts + "]");
    
                // Create request to upload a part.
                UploadPartRequest uploadRequest = new UploadPartRequest()
                .withBucketName(bucketName).withKey(keyName)
                .withUploadId(initResponse.getUploadId()).withPartNumber(i)
                .withFileOffset(filePosition)
                .withFile(file)
                .withPartSize(partSize);
    
                // repeat the upload until it succeeds or reaches the retry limit
                boolean anotherPass;
                int retryCount = 0;
                do {
                    anotherPass = false;  // assume everything is ok
                    try {
                        log.info("Uploading part[" + i + "]");
                        // Upload part and add response to our list.
                        partETags.add(s3Client.uploadPart(uploadRequest).getPartETag());
                        log.info("Finished uploading part[" + i + "] of [" + numOfParts + "]");
                    } catch (Exception e) {
                        log.error("Failed uploading part[" + i + "] due to exception. Will retry... Exception: ", e);
                        anotherPass = true; // repeat
                        retryCount++;
                    }
                }
                while (anotherPass && retryCount < CloudUtilsService.MAX_RETRY);
    
                filePosition += partSize;
                log.info("filePosition=[" + filePosition + "]");
    
            }
            log.info("Finished uploading file");
    
            // Complete.
            CompleteMultipartUploadRequest compRequest = new 
            CompleteMultipartUploadRequest(
                    bucketName, 
                    keyName, 
                    initResponse.getUploadId(), 
                    partETags);
    
            s3Client.completeMultipartUpload(compRequest);
    
            log.info("multipart upload completed.upload id=[" + initResponse.getUploadId() + "]");
        } catch (Exception e) {
            s3Client.abortMultipartUpload(new AbortMultipartUploadRequest(
                    bucketName, keyName, initResponse.getUploadId()));
    
            log.error("Failed to upload due to Exception:", e);
    
            throw e;
        }
    }
    
    
    /**
     * Creates new bucket with the names specified if it does not exist.
     * 
     * @param bucketName    - the name of the bucket to retrieve or create
     * @param accessKey     - Amazon account access key
     * @param secretKey     - Amazon account secret key
     * @throws S3ServiceException - if something goes wrong
     */
    public void createBucketIfNotExists(String bucketName, String accessKey, String secretKey) throws S3ServiceException {
        try {
            // Create a credentials object and service to access S3 account
            org.jets3t.service.security.AWSCredentials myCredentials =
                new org.jets3t.service.security.AWSCredentials(accessKey, secretKey);
            S3Service service = new RestS3Service(myCredentials);
    
            // Create a new bucket named after a normalized directory path,
            // and include my Access Key ID to ensure the bucket name is unique
            S3Bucket zeBucket = service.getOrCreateBucket(bucketName);
            log.info("the bucket [" + zeBucket.getName() + "] was created (if it was not existing yet...)");
        } catch (S3ServiceException e) {
            log.error("Failed to get or create bucket[" + bucketName + "] due to exception:", e);
            throw e;
        }
    }
    

    关于file - 将近 40 万张图像传输到 S3 的最有效方法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5289333/

    相关文章:

    c# - 如何验证文件是否为有效的 Excel 电子表格?

    python - AWS BOTO3 S3 python - 调用 HeadObject 操作 : Not Found 时发生错误 (404)

    security - s3 和 cloudflare 灵活的 ssl 握手

    Python - 无法将用户的所有输入保存到文本文件中

    PHP 文件写入 - 最大行数

    c - 如何将文件读入结构数组?

    amazon-web-services - 如何从 Elastic Beanstalk 连接到 AWS ElasticSearch 实例?

    node.js - 在 Nodejs 中获取 POST 请求的正文(Amazon SNS)

    amazon-s3 - 如何配置 amazon cloudfront 以阻止某些 S3 存储桶文件访问?

    java - 由于我没有使用所有服务,我可以降低 AWS 依赖项大小吗?