我是 aws 新手。我创建了一个集群并通过 ssh 连接到主节点。当我尝试将文件从 s3://my-bucket-name/复制到 pig 中的本地 file://home/hadoop 文件夹时,使用:
cp s3://my-bucket-name/path/to/file file://home/hadoop
我得到错误:
2013-06-08 18:59:00,267 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 29 99: Unexpected internal error. AWS Access Key ID and Secret Access Key must be s pecified as the username or password (respectively) of a s3 URL, or by setting t he fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
我什至无法进入我的 s3 存储桶。我设置了 AWS_ACCESS_KEY 和 AWS_SECRET_KEY 但没有成功。我也找不到 pig 的配置文件来设置适当的字段。
有什么帮助吗?
编辑: 我尝试使用完整的 s3n://uri 在 pig 中加载文件
grunt> raw_logs = LOAD 's3://XXXXX/input/access_log_1' USING TextLoader a
s (line:chararray);
grunt> illustrate raw_logs;
我收到以下错误:
2013-06-08 19:28:33,342 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.HExecutionEngine - Connecting to hadoop file system at: file:/// 2013-06-08 19:28:33,404 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? fal se 2013-06-08 19:28:33,404 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2013-06-08 19:28:33,405 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2013-06-08 19:28:33,405 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2013-06-08 19:28:33,429 [main] INFO org.apache.pig.backend.hadoop.executionengi ne.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percen t is not set, set to default 0.3 2013-06-08 19:28:33,430 [main] ERROR org.apache.pig.pen.ExampleGenerator - Error reading data. Internal error creating job configuration. java.lang.RuntimeException: Internal error creating job configuration. at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java :160) at org.apache.pig.PigServer.getExamples(PigServer.java:1244) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser. java:722) at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigS criptParser.java:591) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScript Parser.java:306) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j ava:189) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.j ava:165) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:500) at org.apache.pig.Main.main(Main.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:187) 2013-06-08 19:28:33,432 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 29 97: Encountered IOException. Exception : Internal error creating job configurati on. Details at logfile: /home/hadoop/pig_1370719069857.log
最佳答案
首先,您应该使用 s3n 协议(protocol)(除非您使用 s3 协议(protocol)将文件存储在 s3 上)- s3 用于 block 存储(即类似于 hdfs,仅在 s3 上),s3n 用于 native s3 文件系统(即你得到你在那里看到的)。
您可以使用 distcp 或来自 s3n 的简单 pig 加载。您可以在 hadoop-site.xml 中提供访问和 secret ,如您在异常中指定的那样(有关更多信息,请参见此处:http://wiki.apache.org/hadoop/AmazonS3),或者您可以将它们添加到 uri:
raw_logs = LOAD 's3n://access:secret@XXXXX/input/access_log_1' USING TextLoader AS (line:chararray);
确保您的密码不包含反斜杠 - 否则它不会起作用。
关于hadoop - 从 s3 ://to local file system 复制文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17002866/