hadoop - 在脚本中运行脚本？ - Hive(和其他 QL)

是否可以在运行脚本的其余部分之前调用脚本并运行它？

我的目标是执行一个设置脚本，该脚本将下载并组织执行主查询所需的数据。

我正在寻找类似的东西:

create table logcontent (content string) row format delimited fields terminated by '\n';

**call secondary hive script with date-range arguments and download necessary logs into <logcontent>**

**perform the rest of the query**

我想这样做是为了为表设置创建一个很好的抽象，以便最终用户不必担心表设置，它将为他们完成。

我知道 AWS 可以选择添加 Hive 脚本作为作业中的一个步骤，但我如何在本地执行相同的操作？这可能吗？如果是这样，语法是什么？如果没有，有哪些解决方法？

最佳答案

答案是在类似的模板中组织您的主 shell 脚本，如下所示。

## Content of main.sh

## Code block to setup Hadoop Environment and config in Path, if not already exist.

## Step 1> Create the hive table in non-interactive mode.
hive -e "create table test(id int, name string) row format delimited fields terminated by '\n'"
# Check if the command is successful. IF else logic can be added.
echo $? 

## Step 2> Call the secondary script executable to download logs
ksh downloadlogs.sh # Assuming the download script could be invoked this way.

## Step 3> Execute rest of the hive queries to organize data
hive -e "select * from test"

关于hadoop - 在脚本中运行脚本？ - Hive(和其他 QL)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31127240/

上一篇：hadoop - PIG Group - 无法获取多个字段

下一篇：azure - 在 Azure SQL 数据仓库中使用 Polybase 技术，我可以查询以 Parquet Hadoop 格式存储的数据吗？

hadoop - 从 RDBMS 导出为 Hadoop 兼容格式

hadoop - 黑斑羚 : Running sum of 1 hour

apache-spark - 如何使用作业 Spark 测量HDFS的读写时间？

sql - HIVE - hive 子查询不适用于带有 IN 子句的 case when 语句

hadoop - hiveql 中 max,min 的不稳定行为

hadoop - 在localhost/127.0.0.1处关闭NameNode

hadoop - 将 csv 文件合并到一个文件中

hadoop - 在 hadoop 中调试数据节点

java - Spring Boot YARN如何部署到Hadoop