apache-spark-sql - 如何将参数传递给 selectExpr? SparkSQL-Scala

标签 apache-spark-sql

:)

当您有数据框时,您可以使用 selectExprt 方法添加列并填充其行

类似这样的事情:

scala> table.show
+------+--------+---------+--------+--------+
|idempr|tipperrd| codperrd|tipperrt|codperrt|
+------+--------+---------+--------+--------+
|  OlcM|       h|999999999|       J|       0|
|  zOcQ|       r|777777777|       J|       1|
|  kyGp|       t|333333333|       J|       2|
|  BEuX|       A|999999999|       F|       3|

scala> var table2 = table.selectExpr("idempr", "tipperrd", "codperrd", "tipperrt", "codperrt", "'hola' as Saludo")
tabla: org.apache.spark.sql.DataFrame = [idempr: string, tipperrd: string, codperrd: decimal(9,0), tipperrt: string, codperrt: decimal(9,0), Saludo: string]

scala> table2.show
+------+--------+---------+--------+--------+------+
|idempr|tipperrd| codperrd|tipperrt|codperrt|Saludo|
+------+--------+---------+--------+--------+------+
|  OlcM|       h|999999999|       J|       0|  hola|
|  zOcQ|       r|777777777|       J|       1|  hola|
|  kyGp|       t|333333333|       J|       2|  hola|
|  BEuX|       A|999999999|       F|       3|  hola|

我的观点是:

我定义字符串并调用一个使用此字符串参数来填充数据框中的列的方法。但我无法执行选择表达式获取字符串(我尝试了 $、+ 等)。要实现这样的目标:

scala> var english = "hello"

scala> def generar_informe(df: DataFrame, tabla: String) {
    var selectExpr_df = df.selectExpr(
      "TIPPERSCON_BAS as TIP.PERSONA CONTACTABILIDAD",
      "CODPERSCON_BAS as COD.PERSONA CONTACTABILIDAD",
      "'tabla' as PUNTO DEL FLUJO" )
}

scala> generar_informe(df,english)

.....

scala> table2.show
+------+--------+---------+--------+--------+------+
|idempr|tipperrd| codperrd|tipperrt|codperrt|Saludo|
+------+--------+---------+--------+--------+------+
|  OlcM|       h|999999999|       J|       0|  hello|
|  zOcQ|       r|777777777|       J|       1|  hello|
|  kyGp|       t|333333333|       J|       2|  hello|
|  BEuX|       A|999999999|       F|       3|  hello|

我尝试过:

scala> var result = tabl.selectExpr("A", "B", "$tabla as C")

scala> var abc = tabl.selectExpr("A", "B", ${tabla} as C)
    <console>:31: error: not found: value $
             var abc = tabl.selectExpr("A", "B", ${tabla} as C)

scala> var abc = tabl.selectExpr("A", "B", "${tabla} as C")

scala> sqlContext.sql("set tabla='hello'")
scala> var abc = tabl.selectExpr("A", "B", "${tabla} as C")

同样的错误:

java.lang.RuntimeException: [1.1] failure: identifier expected
${tabla} as C
^
    at scala.sys.package$.error(package.scala:27)

提前致谢!

最佳答案

你能试试这个吗?

val english = "hello"
    generar_informe(data,english).show()

  }

  def generar_informe(df: DataFrame , english : String)={
    df.selectExpr(
      "transactionId" , "customerId" , "itemId","amountPaid" , s"""'${english}' as saludo """)
  }

这是我得到的输出。

17/11/02 23:56:44 INFO CodeGenerator: Code generated in 13.857987 ms
+-------------+----------+------+----------+------+
|transactionId|customerId|itemId|amountPaid|saludo|
+-------------+----------+------+----------+------+
|          111|         1|     1|     100.0| hello|
|          112|         2|     2|     505.0| hello|
|          113|         3|     3|     510.0| hello|
|          114|         4|     4|     600.0| hello|
|          115|         1|     2|     500.0| hello|
|          116|         1|     2|     500.0| hello|
|          117|         1|     2|     500.0| hello|
|          118|         1|     2|     500.0| hello|
|          119|         2|     3|     500.0| hello|
|          120|         1|     2|     500.0| hello|
|          121|         1|     4|     500.0| hello|
|          122|         1|     2|     500.0| hello|
|          123|         1|     4|     500.0| hello|
|          124|         1|     2|     500.0| hello|
+-------------+----------+------+----------+------+

17/11/02 23:56:44 INFO SparkContext: Invoking stop() from shutdown hook

关于apache-spark-sql - 如何将参数传递给 selectExpr? SparkSQL-Scala,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47040698/

相关文章:

apache-spark - Dataset.rdd 是 Action 还是转换?

scala - 如何从 Spark 数据帧中的 AWS S3 读取多个文件?

apache-spark - Spark/Yarn-连接错误RetryingBlockFetcher尝试从随机端口获取 block

apache-spark - TypeError : 'Column' object is not callable using WithColumn

python - 在 pyspark 中找不到 col 函数

python - 如何添加多个整数列值并创建单个列

apache-spark - spark数据帧中过滤器的多个条件

scala - 过滤器和scala spark sql中的where之间的区别

apache-spark - 使用日期范围对分区数据进行 Spark SQL 查询

java - withColumn() 内的 AnalysisException callUDF()