我正在编写一个 spark 作业,尝试使用 Scala 读取文本文件,以下在我的本地机器上运行良好。
val myFile = "myLocalPath/myFile.csv"
for (line <- Source.fromFile(myFile).getLines()) {
val data = line.split(",")
myHashMap.put(data(0), data(1).toDouble)
}
然后我尝试让它在 AWS 上运行,我做了以下操作,但它似乎没有正确读取整个文件。在 s3 上读取此类文本文件的正确方法是什么?非常感谢!
val credentials = new BasicAWSCredentials("myKey", "mySecretKey");
val s3Client = new AmazonS3Client(credentials);
val s3Object = s3Client.getObject(new GetObjectRequest("myBucket", "myFile.csv"));
val reader = new BufferedReader(new InputStreamReader(s3Object.getObjectContent()));
var line = ""
while ((line = reader.readLine()) != null) {
val data = line.split(",")
myHashMap.put(data(0), data(1).toDouble)
println(line);
}
最佳答案
我想我的工作如下:
val s3Object= s3Client.getObject(new GetObjectRequest("myBucket", "myPath/myFile.csv"));
val myData = Source.fromInputStream(s3Object.getObjectContent()).getLines()
for (line <- myData) {
val data = line.split(",")
myMap.put(data(0), data(1).toDouble)
}
println(" my map : " + myMap.toString())
关于scala - Spark : read csv file from s3 using scala,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32470705/