mysql - AWS Data Pipeline MySQL Nulls sed shell 命令事件 MIGRAINE

标签 mysql sql amazon-web-services amazon-s3 amazon-data-pipeline

有以下场景:

每天需要将SQL表传输到MySQL数据库。 我尝试使用 CopyActivity 来使用数据管道,但导出的 CSV 有空格而不是\N 或 NULL,因此 MySQL 将这些字段导入为“”,这对我们的应用程序不利。

然后我尝试了一种稍微不同的方法。 通过CopyActivity将表导出到S3,ShellCommandActivity下载文件,执行下面的脚本并将文件上传到s3:

#!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/*.csv |cat ${INPUT1_STAGING_DIR}/*.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv

上面的脚本在我的测试 Linux 实例上完美运行,但在临时 EC2 资源上执行时没有任何反应。我没有收到任何错误,只是输出 s3 数据节点上有相同的无用 csv 和空格。

我不知道我做错了什么,也不知道为什么脚本的工作方式与我的测试 Linux 实例不同。

管道日志:

18 Jul 2016 10:23:06,470 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.ShellCommandActivity@515aa023
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Local File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is 
18 Jul 2016 10:23:06,648 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Download just the root file to the local dir. Updated File Relative compared to Input Root Path:s3://s3-bucket/mysqlexport/sqltable.csv is sqltable.csv
18 Jul 2016 10:23:06,649 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Downloading S3 file s3://s3-bucket/mysqlexport/sqltable.csv to /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd/sqltable.csv
18 Jul 2016 10:23:06,824 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed Downloading files from S3 Path:s3://s3-bucket/mysqlexport/sqltable.csv to output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:06,862 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: #!/bin/bash
sed -i -e 's/^,/\\N,/' -e 's/,$/,\\N/' -e 's/,,/,\\N,/g' -e 's/,,/,\\N,/g' ${INPUT1_STAGING_DIR}/sqltable.csv |cat ${INPUT1_STAGING_DIR}/sqltable.csv > ${OUTPUT1_STAGING_DIR}/sqltable.csv
18 Jul 2016 10:23:06,865 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: configure ApplicationRunner with stdErr file: output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdError  and stdout file :output/logs/df-09799242T7UHHPMT072T/ShellCommandActivityId_18OqM/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38/@ShellCommandActivityId_18OqM_2016-07-18T10:18:38_Attempt=1/StdOutput
18 Jul 2016 10:23:06,866 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.objects.CommandRunner: Executing command: output/tmp/df-09799242T7UHHPMT072T-de05e7a112c440b4a42df69d554d8a9a/ShellCommandActivityId18OqM20160718T101838Attempt1_command.sh with env variables :{INPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd, OUTPUT1_STAGING_DIR=/media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e} with argument : null
18 Jul 2016 10:23:06,952 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Uploading local directory:output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to S3 s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:06,977 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin Upload single file to S3:s3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:06,978 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Begin upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to  S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of file /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e/sqltable.csv to  S3 paths3://s3-bucket/mysqlexport/sqltable.csv
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed uploading of all files
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.S3Helper: Completed upload of local dir output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e to s3://s3-bucket/mysqlexport/
18 Jul 2016 10:23:07,040 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageFromS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_input1_7c583c0755eb46f5b518feffa314fccd
18 Jul 2016 10:23:07,050 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.connector.staging.StageInS3Connector: cleaning up directory /media/ephemeral0/mnt/taskRunner/output/staging/df-09799242T7UHHPMT072T_output1_7c8b2db30c16473f844db5eb21cb000e
18 Jul 2016 10:23:07,051 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @DefaultShellCommandActivity1_2016-07-18T10:18:38_Attempt=1
18 Jul 2016 10:23:07,052 [INFO] (TaskRunnerService-resource:df-09799242T7UHHPMT072T_@ResourceId_x5OCd_2016-07-18T10:18:38-1) df-09799242T7UHHPMT072T amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 0:0 to complete

最佳答案

不确定到底是什么问题,bash 脚本或 shell 命令不喜欢管道。不管怎样,首先要感谢 TenG 提醒我要单独分析过程中的每个步骤,而不是整体来看。

基本上,我没有下载脚本,而是在“命令”字段中输入了 sed 和复制命令。

sed -i -e 's/^,/\\\\N,/' -e 's/,,/,\\\\N,/g' -e 's/,$/,\\\\N/' ${INPUT1_STAGING_DIR}/sqltable.csv  
cp ${INPUT1_STAGING_DIR}/sqltable.csv ${OUTPUT1_STAGING_DIR}/sqltable.csv
无论如何,我仍然对 AWS Pipeline 感到失望,因为他们声称支持 MySQL,而它甚至无法识别 csv 文件中的\N 和 NULL 等标准 MySQL 空值。如果我可以使用 AWS Pipeline 进行导出和导入,那将会非常简单和高效,但不幸的是,如果您的应用程序区分空字段和 NULL 值,那么它不适合后者。

关于mysql - AWS Data Pipeline MySQL Nulls sed shell 命令事件 MIGRAINE,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38435101/

相关文章:

mysql - UNION 会导致 MyIsam 上的表锁定吗?

mysql - #1314 - 存储过程中不允许使用 LOCK

java - 亚马逊电子病历 : running Custom Jar with input and output from S3

amazon-web-services - 如何使 AWS api 网关接受 http 而不是 https

java - hibernate 什么都不选择

mysql - SQL - 在很多情况下 updated_date 为 null 时如何按 item_id 和 last_updated 日期查询?

php - 算出上周的小时数

java - 使用条件计算 hibernate 中按行分组的数量

django - S3 Lambda触发器不会为每个文件上传触发

php - 代码点火器 2.0 : Unknown column ' ' in 'where clause'