java - 使用分隔符将json字符串存储在Hadoop文件中

标签 java hadoop

我正在将对象转换为json字符串,并将其存储到hadoop文件系统中,其中每个字符串都由分隔符分隔。

FSDataOutputStream out = null;
        try {
            FileSystem fs = FileSystem.get(hadoopConfiguration);

            out = fs.exists(outFile) ? fs.append(outFile) : fs.create(outFile);
            ConnTrackInfo conntrack = new ConnTrackInfo(124, "ranjeet@triconinfotec.com", "instrutor", new Date(), "Section", 234,"Economics",9991, "EZT", "124XSD234", 33, "GeneralEconomics", "192.168.1.210");
            Gson gson = new Gson();
            String jsonstring = null;
            jsonstring = gson.toJson(conntrack);

            out.writeUTF(jsonstring.concat("@@@"));
        } catch (Exception e) {
            logger.error("Unable to persist tracking data", e);
        } finally {
            out.close();
        }

它可以很好地存储数据,但是用我的定界符增加了一些额外的数字。
我的文件数据是这样的
1{"userId":124,"emailId":"ranjeet@triconinfotec.com","role":"instrutor","date":"Jul 30, 2014 11:56:12 AM","target":"Section","sectionId":234,"sectionName":"Economics","assignmentId":9991,"assignmentName":"EZT","isbn":"124XSD234","courseId":33,"courseName":"GeneralEconomics","ipaddress":"192.168.1.210"}@@@1{"userId":124,"emailId":"ranjeet@triconinfotec.com","role":"instrutor","date":"Jul 30, 2014 11:56:55 AM","target":"Section","sectionId":234,"sectionName":"Economics","assignmentId":9991,"assignmentName":"EZT","isbn":"124XSD234","courseId":33,"courseName":"GeneralEconomics","ipaddress":"192.168.1.210"}@@@1{"userId":124,"emailId":"ranjeet@triconinfotec.com","role":"instrutor","date":"Jul 30, 2014 12:15:02 PM","target":"Section","sectionId":234,"sectionName":"Economics","assignmentId":9991,"assignmentName":"EZT","isbn":"124XSD234","courseId":33,"courseName":"GeneralEconomics","ipaddress":"192.168.1.210"}@@@1{"userId":124,"emailId":"ranjeet@triconinfotec.com","role":"instrutor","date":"Jul 30, 2014 12:18:25 PM","target":"Section","sectionId":234,"sectionName":"Economics","assignmentId":9991,"assignmentName":"EZT","isbn":"124XSD234","courseId":33,"courseName":"GeneralEconomics","ipaddress":"192.168.1.210"}@@@1{"userId":124,"emailId":"ranjeet@triconinfotec.com","role":"instrutor","date":"Jul 30, 2014 12:19:23 PM","target":"Section","sectionId":234,"sectionName":"Economics","assignmentId":9991,"assignmentName":"EZT","isbn":"124XSD234","courseId":33,"courseName":"GeneralEconomics","ipaddress":"192.168.1.210"}@@@0{"userId":124,"emailId":"ranjeet@triconinfotec.com","role":"instrutor","date":"Jul 30, 2014 12:22:37 PM","target":"Section","sectionId":234,"sectionName":"Economics","assignmentId":9991,"assignmentName":"EZT","isbn":"124XSD234","courseId":33,"courseName":"GeneralEconomics","ipaddress":"192.168.1.210"}

如果您看到第一行,则它在1之后写入我的数据,所以以后我的所有数据都有定界符@@@ 1,但是我不希望在这里1。我正在使用Hadoop,因此需要解决方案

最佳答案

不要使用writeUTF方法,它会按照here的说明以UTF-8修改后的编码方式写入数据。尝试使用PrintWriter:

PrintWriter pw = new PrintWriter(out);
pw.println(jsonstring + "@@@");
pw.close();

关于java - 使用分隔符将json字符串存储在Hadoop文件中,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25030737/

相关文章:

java - 如何在 Intellij IDEA 中为 Apache Tomcat 指定自定义 JRE 路径?

java - 如何备份运行中的嵌入式H2数据库引擎?

hadoop - 使用 s3 的 spark 加载 json 时 FS 错误

python - python子进程检查输出:如何获取整个消息

apache-spark - 使用 HDFS 存储的 Spark 作业

hadoop - Hive和RegexSerde仅返回NULL

hadoop - spark-submit 不使用 YARN

java - java中哪里使用屏障模式?

java - 从 "heap memory"到 "java.sql.SQLException: Operation not allowed after ResultSet closed"

java - 错误: cannot find symbol while calling toString method