我有一个sqoop作业,需要将数据从oracle导入到hdfs。
我正在使用的sqoop查询是sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '1' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test1 --fields-terminated-by '\t'
我一次又一次地运行同一个查询,并将partitionid从1更改为96。因此,我应该手动执行sqoop import命令96次。表“ORDERS”包含数百万行,每行的分区号为1到96。我需要将10001行从每个分区号导入hdfs。
有什么办法吗?如何自动执行sqoop工作?
最佳答案
运行脚本:$ ./script.sh 20 // -------第20个条目
ramisetty@HadoopVMbox:~/ramu$ cat script.sh
#!/bin/bash
PART_ID=$1
TARGET_DIR_ID=$PART_ID
echo "PART_ID:" $PART_ID "TARGET_DIR_ID: "$TARGET_DIR_ID
sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '$PART_ID' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test/$TARGET_DIR_ID --fields-terminated-by '\t'
适用于1至96-一次性
ramisetty@HadoopVMbox:~/ramu$ cat script_for_all.sh
#!/bin/bash
for part_id in {1..96};
do
PART_ID=$part_id
TARGET_DIR_ID=$PART_ID
echo "PART_ID:" $PART_ID "TARGET_DIR_ID: "$TARGET_DIR_ID
sqoop import --connect jdbc:oracle:thin:@hostname:port/service --username sqoop --password sqoop --query "SELECT * FROM ORDERS WHERE orderdate = To_date('10/08/2013', 'mm/dd/yyyy') AND partitionid = '$PART_ID' AND rownum < 10001 AND \$CONDITIONS" --target-dir /test/$TARGET_DIR_ID --fields-terminated-by '\t'
done
关于shell - 安排和自动化sqoop导入/导出任务,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30720646/