java - 哥布林错误:无法将字段:派生水印列转换为记录的:“abc”值:

标签 java mysql hadoop hdfs gobblin

我正在尝试将数据从mysql表提取到hdfs。但这给了我下面的错误

IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582873318919_0] 504 - Processing record incurs an unexpected exception:

java.lang.RuntimeException: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record: 
{"id":"1","name":"abc","password":"abc","derivedwatermarkcolumn":"abc"}
at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:647)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    ... 22 more
IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582893709536_0] 567 - Task task_GobblinMySql_1582893709536_0 failed
java.lang.RuntimeException: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:505)
    at org.apache.gobblin.runtime.Task.run(Task.java:362)
    at org.apache.gobblin.runtime.TaskExecutor$TrackingTask.run(TaskExecutor.java:443)
    at org.apache.gobblin.util.executors.MDCPropagatingRunnable.run(MDCPropagatingRunnable.java:39)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Failed to parse the date
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$DateConverter.convertField(JsonElementConversionFactory.java:450)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$RecordConverter.convertField(JsonElementConversionFactory.java:639)
    at org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:81)
    at org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter.convertRecord(JsonIntermediateToAvroConverter.java:50)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecordImpl(InstrumentedConverterDecorator.java:74)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterBase.convertRecord(InstrumentedConverterBase.java:125)
    at org.apache.gobblin.instrumented.converter.InstrumentedConverterDecorator.convertRecord(InstrumentedConverterDecorator.java:68)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator$ChainedConverterIterator.<init>(MultiConverter.java:174)
    at org.apache.gobblin.runtime.MultiConverter$MultiConverterIterator.<init>(MultiConverter.java:130)
    at org.apache.gobblin.runtime.MultiConverter$1.iterator(MultiConverter.java:95)
    at org.apache.gobblin.runtime.Task.runSynchronousModel(Task.java:499)
    ... 12 more



下面是记录架构
IST INFO  [JobScheduler-0] org.apache.gobblin.source.jdbc.JdbcExtractor [demo_user_1582893709536_0] 361 - Schema:[

{"columnName":"id","dataType":{"type":"int"},"isWaterMark":false,"primaryKey":1,"length":0,"precision":10,"scale":0,"isNullabl
e":false,"format":"","comment":"","isUnique":false},

{"columnName":"name","dataType":"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"password","dataType":{"type":"string"},"isWaterMark":false,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNulla
ble":true,"format":"","comment":"","isUnique":false},

{"columnName":"derivedwatermarkcolumn","dataType":{"type":"timestamp"},"isWaterMark":true,"primaryKey":0,"length":0,"precision":0,"scale":0,"isNul
lable":false,"comment":"Default watermark column","isUnique":false}]


水印派生水印列的数据类型是时间戳,但记录中的是字符串'abc'

作业和属性文件如下。

mysql.pull
# Job properties
job.name=GobblinMySql
job.group=MySql
job.description=Data pull from MySql
job.lock.enabled=False


# Extract properties
extract.namespace=demo
extract.table.type=snapshot_only
extract.table.name=user
extract.delta.fields=name,password
extract.primary.key.fields=id

# Property to consider the extract as full dump
extract.is.full=true

# Source properties
source.querybased.schema=demo
source.entity=user
source.querybased.extract.type=snapshot

mysql.properties
# Source properties - source class to extract data from Mysql Source
source.class=org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource

# Source properties
source.max.number.of.partitions=1
source.querybased.partition.interval=1
source.querybased.is.compression=false
source.querybased.watermark.type=timestamp

# Source connection properties
source.conn.driver=com.mysql.jdbc.Driver
source.conn.username=root
source.conn.password=root
source.conn.host=localhost
source.conn.port=3306
source.conn.timeout=1500

# Converter properties - Record from mysql source will be processed by the below series of converters
converter.classes=org.apache.gobblin.converter.avro.JsonIntermediateToAvroConverter

# date columns format
converter.avro.timestamp.format=YYYY-MM-DD HH:MM:SS
converter.avro.date.format=yyyy-MM-dd
converter.avro.time.format=HH:mm:ss

# Qualitychecker properties
qualitychecker.task.policies=org.apache.gobblin.policies.count.RowCountPolicy,org.apache.gobblin.policies.schema.SchemaCompatibilityPolicy
qualitychecker.task.policy.types=OPTIONAL,OPTIONAL

# Publisher properties
data.publisher.type=org.apache.gobblin.publisher.BaseDataPublisher

是什么在配置文件中导致此错误?如果有人知道,请帮助。

最佳答案

看起来水印列的名称来自extract.delta.fields属性。在您的示例中,将其设置为“name,password”,因此该名称被视为水印。尝试将其设置为“derivedwatermarkcolumn”。

我是怎么发现的:我浏览了MysqlSource类的代码以查找提到水印的位置,然后使用IntelliJ的检查器找出数据的来源。您可以通过上下文菜单->分析->分析数据流到此处。

关于java - 哥布林错误:无法将字段:派生水印列转换为记录的:“abc”值:,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60452758/

相关文章:

mysql - 从 MySQL 中删除外键

java - 我什么时候应该在 java 中使用 JFrame.add(component) 和 JFrame.getContentPane().add(component)

mysql - 193 : %1 is not a valid Win32 application error with Ruby (1. 9.3) 在 Windows 7 上使用 MySQL(5.5) 的 Rails(3.2.3)

mysql - 将 "readonly"对象作为序列化数据存储在关系数据库中

java - 使用Hadoop-2.5.0配置Pig-0.12.1

hadoop - 在 [ubuntu] ubuntu : ssh: connect to host ubuntu port 22: No route to host 上启动名称节点

python - PySpark,调用saveAsTextFile时出错

java - 使用@PropertySource配置Spring属性

java - 如何使用 JdbcTemplate 返回自动生成的 id 进行插入?

java - 如何使用比较器对数组中的某些特定元素进行排序