scala - Spark Scala UDF 参数限制为 10

标签 scala apache-spark apache-spark-sql user-defined-functions

我需要创建一个具有 11 个参数的 Spark UDF。有什么办法可以实现吗?
我知道我们可以创建一个最多有 10 个参数的 UDF

下面是 10 个参数的代码。有效

val testFunc1 = (one: String, two: String, three: String, four: String,
                 five: String, six: String, seven: String, eight: String, nine: String, ten: String) => {
    if (isEmpty(four)) false
    else four match {
        case "RDIS" => three == "ST"
        case "TTSC" => nine == "UT" && eight == "RR"
        case _ => false
    }
}
import org.apache.spark.sql.functions.udf    
udf(testFunc1)

下面是 11 个参数的代码。面临“未指定值参数:dataType”问题

val testFunc2 = (one: String, two: String, three: String, four: String,
                 five: String, six: String, seven: String, eight: String, nine: String, ten: String, ELEVEN: String) => {
  if (isEmpty(four)) false
  else four match {
    case "RDIS" => three == "ST"
    case "TTSC" => nine == "UT" && eight == "RR" && ELEVEN == "OR"
    case _ => false
  }
}
import org.apache.spark.sql.functions.udf    
udf(testFunc2) // compilation error

最佳答案

我建议将参数打包在Map中:

import org.apache.spark.sql.functions._

val df = sc.parallelize(Seq(("a","b"),("c","d"),("e","f"))).toDF("one","two")


val myUDF = udf((input:Map[String,String]) => {
  // do something with the input
  input("one")=="a"
})

df
  .withColumn("udf_args",map(
    lit("one"),$"one",
    lit("two"),$"one"
  )
 )
 .withColumn("udf_result", myUDF($"udf_args"))
 .show()

+---+---+--------------------+----------+
|one|two|            udf_args|udf_result|
+---+---+--------------------+----------+
|  a|  b|Map(one -> a, two...|      true|
|  c|  d|Map(one -> c, two...|     false|
|  e|  f|Map(one -> e, two...|     false|
+---+---+--------------------+----------+

关于scala - Spark Scala UDF 参数限制为 10,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48637297/

相关文章:

scala - Akka 使用多少线程?

xml - 在 Scala 中编码/解码 XML

scala - Spark如何与CPython互操作

python - 如何在 Pyspark 中获取字符串的模

apache-spark - Spark 操作因 EOFException 而卡住

docker - Py4JJavaError : An error occurred while calling o45. 加载。 : java. lang.NoClassDefFoundError:org/apache/spark/sql/sources/v2/StreamWriteSupport

scala - Spark 中的星期几日期格式字符串 java

scala - Semigroup和SemigroupK之间的区别

java - 使 Scala 代码可与 Java 库互操作

scala - Spark UDF 返回多个项目