java - 如何将整列的大小写更改为小写?

标签 java apache-spark apache-spark-sql apache-spark-dataset

我想在Spark Dataset中将整个列的大小写更改为小写

        Desired Input
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|BRUSH & BROOM HAN...|
        |   XYZ|WHEEL BRUSH PARTS...|
        +------+--------------------+

        Desired Output
        +------+--------------------+
        |ItemID|       Category name|
        +------+--------------------+
        |   ABC|brush & broom han...|
        |   XYZ|wheel brush parts...|
        +------+--------------------+

我尝试了collectAsList()toString(),这对于非常大的数据集来说是缓慢而复杂的过程。

我还发现了一种“降低”方法,但是却不知道如何使它在dasaset中工作
请建议我一种简单或有效的方法来完成上述操作。提前致谢

最佳答案

使用lower中的org.apache.spark.sql.functions函数

例如:

df.select($"q1Content", lower($"q1Content")).show

输出。
+--------------------+--------------------+
|           q1Content|    lower(q1Content)|
+--------------------+--------------------+
|What is the step ...|what is the step ...|
|What is the story...|what is the story...|
|How can I increas...|how can i increas...|
|Why am I mentally...|why am i mentally...|
|Which one dissolv...|which one dissolv...|
|Astrology: I am a...|astrology: i am a...|
| Should I buy tiago?| should i buy tiago?|
|How can I be a go...|how can i be a go...|
|When do you use  ...|when do you use  ...|
|Motorola (company...|motorola (company...|
|Method to find se...|method to find se...|
|How do I read and...|how do i read and...|
|What can make Phy...|what can make phy...|
|What was your fir...|what was your fir...|
|What are the laws...|what are the laws...|
|What would a Trum...|what would a trum...|
|What does manipul...|what does manipul...|
|Why do girls want...|why do girls want...|
|Why are so many Q...|why are so many q...|
|Which is the best...|which is the best...|
+--------------------+--------------------+

关于java - 如何将整列的大小写更改为小写?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43500827/

相关文章:

java - 我如何将 TestNG 与 Apache Ivy 一起使用?

apache-spark - 如何在 Spark 流应用程序中处理 DynamoDB Stream

hadoop - 构造函数 ClientArguments(String[], SparkConf) 未定义

java - 如何使用 Hibernate 在 Spring Boot 中处理数据库迁移?

java - 同步方法以避免死锁

scala - 填补时间序列 Spark 中的空白

scala - Spark Scala UDF 参数限制为 10

scala - 迭代 Spark 数据框中的行和列

java - 我如何根据非唯一键插入或更新 jpa 实体?

python - 尝试运行 pyspark 时无法初始化主类 org.apache.spark.deploy.SparkSubmit