apache-spark - SparkSQL/配置单元 : equivalent of MySQL's `information_schema.table.{data_length, table_rows}` ?

标签 apache-spark hive apache-spark-sql hiveql

在MySQL中，我们可以查询information_schema.tables表，得到data_length或table_rows等有用信息>

select
  data_length
  , table_rows
from
  information_schema.tables
where  
  table_schema='some_db'
  and table_name='some_table';

+-------------+------------+
| data_length | table_rows |
+-------------+------------+
|        8368 |        198 |
+-------------+------------+
1 row in set (0.01 sec)

SparkSQL/Hive 是否有等效的机制？

我可以使用 SparkSQL 或像 HiveMetaStoreClient 这样的程序 API (java API org.apache.hadoop.hive.metastore.HiveMetaStoreClient) .对于后者，我阅读了 API 文档 ( here )，但找不到任何与表行号和大小相关的方法。

最佳答案

元信息没有一个命令。而是有一组 commands , 你可以使用

描述表/ View /列

desc [formatted|extended] schema_name.table_name;

show table extended like part_table;
SHOW TBLPROPERTIES tblname("foo");

显示列统计信息(Hive 0.14.0 及更高版本)

DESCRIBE FORMATTED [db_name.]table_name column_name;
DESCRIBE FORMATTED [db_name.]table_name column_name PARTITION (partition_spec);

关于apache-spark - SparkSQL/配置单元 : equivalent of MySQL's `information_schema.table.{data_length, table_rows}` ?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49203014/

上一篇：java - 从 EPSG :4326 to EPSG:31467 in Java/GWT server side 转换坐标

下一篇：c - 为什么有些线程没有收到 pthread_cond_broadcast？

相关文章：

java - 如何使用 Java 将 Spark 数据集的所有列转换为字符串

python - PySpark DataFrame 无法正确解析时间

hadoop - Spark - java IOException :Failed to create local dir in/tmp/blockmgr*

scala - Apache Spark - 两个样本 Kolmogorov-Smirnov 测试

apache-spark - 将日志与 Apache Spark 分开

hadoop - 使用 DBeaver 连接到 Hive 数据库

python - 无法使用impyla/dbapi.py使用python连接到配置单元

python - Pyspark reduceByKey 与 (key, Dictionary) 元组

apache-spark - 如何在Spark中获取总和

hadoop - 时间戳在 hive 中不起作用