hadoop - 为什么 "select unix_timestamp(' ') is null"当 "select unix_timestamp(' ')"返回 null 时返回 false？

标签 hadoop apache-spark hive apache-spark-sql

使用 Spark 1.6.2 并尝试查找字段是否包含空字符串或日期值。

Spark documentation解释了如果 unix_timestamp() 函数在失败时返回 null，则预期会出现以下行为:

sqlContext.sql("select unix_timestamp('')").show
+----+
| _c0|
+----+
|null|
+----+

但是当我尝试用“is null”检查它时，它返回 false:

sqlContext.sql("select unix_timestamp('') is null").show
+-----+
|  _c0|
+-----+
|false|
+-----+

相同的查询在 Hive 中返回 true:

hive> select unix_timestamp('') is null;
OK
true

为了完整性，这里对 null 进行 null 检查:

sqlContext.sql("select null is null").show
+----+
| _c0|
+----+
|true|
+----+

最佳答案

这是一个错误，看起来它已经在 Spark 2.x 分支中得到解决(可能使用 SPARK-12054 )问题的根源是 unix_timestamp 返回的模式。如果执行:

sqlContext.sql("select unix_timestamp('')").printSchema

你会看到:

root
 |-- _c0: long (nullable = false)

由于架构被报告为不可为空，因此不会检查值，并且 unix_timestamp(...).isNull 始终为 false。

关于hadoop - 为什么 "select unix_timestamp(' ') is null"当 "select unix_timestamp(' ')"返回 null 时返回 false？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40496163/

上一篇：hadoop - 已安装 Spark 但未找到命令 'hdfs' 或 'hadoop'

下一篇：java - initMiniDFSCluster 抛出NoClassDefFoundError(hadoop客户端测试)

相关文章：

在引导操作中找不到 Hadoop 命令

hadoop - 在哪个节点上编辑 hadoop .xml 文件？

python - pyspark和reduceByKey : how to make a simple sum

apache-spark - Spark Streaming 与 cassandra 直接连接不起作用

mysql - Sqoop: Could not load mysql driver 异常

hadoop - 粘性位设置 hive 执行失败拒绝权限

hadoop - Hadoop Map程序以打印输入文件

hadoop - 为什么即使启用了日志记录，我的 yarn 应用程序也没有日志？

hadoop - hive :尝试映射键和值时出错

SQL 移动平均线