对于 spark structured streaming 读取过程:
sdf.writeStream
.outputMode(outputMode)
.format("console")
.trigger(Trigger.ProcessingTime("2 seconds"))
.start())
format(console)
正确写入其输出,如下所示:
Batch: 3
+----------+------+-------+-----------------+
|OnTimeRank|Origin|Carrier| OnTimePct|
+----------+------+-------+-----------------+
| 1| BWI| EV| 90.0|
| 2| BWI| US|88.54072251715655|
| 3| BWI| CO|88.52097130242826|
| 4| BWI| YV| 87.2168284789644|
| 5| BWI| DL|86.21888471700737|
| 6| BWI| NW|86.04866030181707|
| 7| BWI| 9E|85.83545377438507|
| 8| BWI| AA|85.71428571428571|
| 9| BWI| FL|83.25366684127816|
| 10| BWI| UA|81.32427843803056|
| 1| CMI| MQ|81.92159607980399|
| 1| IAH| NW| 91.6242895602752|
| 2| IAH| F9|88.62350722815839|
| 3| IAH| US|87.54764930114358|
| 4| IAH| 9E|84.33613445378151|
| 5| IAH| OO| 84.2836946277097|
| 6| IAH| DL|83.46420323325636|
| 7| IAH| UA|83.40671436433682|
| 8| IAH| XE|81.35189010909355|
| 9| IAH| OH|80.61558611656844|
+----------+------+-------+-----------------+
但这只是结果的一部分。是否有等效于 dataframe.show(NumRows, truncate)
通过 option
设置 - 沿着 .option("maxRows",1000)
:
sdf.writeStream
.outputMode(outputMode)
.format("console")
.option("maxRows",1000) // This is what I want but not sure how to do
.trigger(Trigger.ProcessingTime("2 seconds"))
.start())
最佳答案
该选项称为 numRows
例如.option("numRows",1000)
关于apache-spark - 使用控制台输出格式显示 Spark 流批处理的完整结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55823841/