apache-spark - 使用控制台输出格式显示 Spark 流批处理的完整结果

标签 apache-spark spark-structured-streaming

对于 spark structured streaming 读取过程:

sdf.writeStream
  .outputMode(outputMode)
  .format("console")
  .trigger(Trigger.ProcessingTime("2 seconds"))
  .start())

format(console) 正确写入其输出，如下所示:

Batch: 3
+----------+------+-------+-----------------+
|OnTimeRank|Origin|Carrier|        OnTimePct|
+----------+------+-------+-----------------+
|         1|   BWI|     EV|             90.0|
|         2|   BWI|     US|88.54072251715655|
|         3|   BWI|     CO|88.52097130242826|
|         4|   BWI|     YV| 87.2168284789644|
|         5|   BWI|     DL|86.21888471700737|
|         6|   BWI|     NW|86.04866030181707|
|         7|   BWI|     9E|85.83545377438507|
|         8|   BWI|     AA|85.71428571428571|
|         9|   BWI|     FL|83.25366684127816|
|        10|   BWI|     UA|81.32427843803056|
|         1|   CMI|     MQ|81.92159607980399|
|         1|   IAH|     NW| 91.6242895602752|
|         2|   IAH|     F9|88.62350722815839|
|         3|   IAH|     US|87.54764930114358|
|         4|   IAH|     9E|84.33613445378151|
|         5|   IAH|     OO| 84.2836946277097|
|         6|   IAH|     DL|83.46420323325636|
|         7|   IAH|     UA|83.40671436433682|
|         8|   IAH|     XE|81.35189010909355|
|         9|   IAH|     OH|80.61558611656844|
+----------+------+-------+-----------------+

但这只是结果的一部分。是否有等效于 dataframe.show(NumRows, truncate) 通过 option 设置 - 沿着 .option("maxRows",1000) :

sdf.writeStream
  .outputMode(outputMode)
  .format("console")
  .option("maxRows",1000)  // This is what I want but not sure how to do
  .trigger(Trigger.ProcessingTime("2 seconds"))
  .start())

最佳答案

该选项称为 numRows 例如.option("numRows",1000)

来源https://github.com/apache/spark/blob/2a80a4cd39c7bcee44b6f6432769ca9fdba137e4/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWrite.scala#L33

关于apache-spark - 使用控制台输出格式显示 Spark 流批处理的完整结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/55823841/

上一篇：python-3.x - 在python包中查找某些方法和函数的所有用法

下一篇：ethereum - web3.js中currentprovider和givenprovider有什么区别

相关文章：

python - 是否可以在 Pyspark 中继承 DataFrame？

scala - 为什么 spark-shell 在使用 3000 列的 DataFrame 后打印数千行代码？什么是 JaninoRuntimeException 和 64 KB？

scala - Spark : NullPointerException when RDD isn't collected before map

apache-spark - 在 PySpark Structured Streaming 中对多个输出流使用单个流式 DataFrame

scala - Spark Structured Streaming 左外连接为已经匹配的行返回外空值

scala - Spark 嵌套转换 SPARK-5063

r - 如何使用Sparklyr包对不同数据类型的数据进行扁平化？

java - Spark结构化流: Current batch is falling behind

apache-spark - 检查点后未使用新的 spark.sql.shuffle.partitions 值

apache-spark - 在 Spark 结构化流中获取窗口的所有行