scala - Spark Scala 如何在 RDD 中使用替换函数

标签 scala apache-spark

我有一个推文文件

396124436845178880,"When's 12.4k gonna roll around",Matty_T_03
396124437168537600,"I really wish I didn't give up everything I did for you.     I'm so mad at my self for even letting it get as far as it did.",savava143
396124436958412800,"I really need to double check who I'm sending my     snapchats to before sending it 😩😭",juliannpham
396124437218885632,"@Darrin_myers30 I feel you man, gotta stay prayed up.     Year is important",Ful_of_Ambition
396124437558611968,"tell me what I did in my life to deserve this.",_ItsNotBragging
396124437499502592,"Too many fine men out here...see me drooling",LolaofLife
396124437722198016,"@jaiclynclausen will do",I_harley99

我试图在将文件读入 RDD 后替换所有特殊字符,
    val fileReadRdd = sc.textFile(fileInput)
    val fileReadRdd2 = fileReadRdd.map(x => x.map(_.replace(","," ")))
    val fileFlat = fileReadRdd.flatMap(rec => rec.split(" "))

我收到以下错误
Error:(41, 57) value replace is not a member of Char
    val fileReadRdd2 = fileReadRdd.map(x => x.map(_.replace(",","")))

最佳答案

我猜测:

x => x.map(_.replace(",",""))

将您的字符串视为字符序列,而您实际上想要
x => x.replace(",", "")

(即您不需要映射字符的“序列”)

关于scala - Spark Scala 如何在 RDD 中使用替换函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42909092/

相关文章:

java - 获取 JFrame 在平铺窗口管理器上显示 float

scala - 订购 RDD[String]

class - 我可以在 Scala 匹配语句中使用类变量吗?

apache-spark - 仅将每一行的非空列收集到数组中

apache-spark - 防止 Spark 在流/流连接中存储状态

java - Spark 使用 Java 流式传输一个日志文件不会生成任何输出

Scalameta:识别特定注释

scala - 将当前项目设置为 default-6c6f02(在构建文件 :/home/user_name/Videos/中

apache-spark - 如何优化 spark structured streaming app 中执行器实例的数量?

api - 为 Apache Spark 应用程序结果提供 API 的推荐方法是什么