我正在对数据 block 增量表执行合并操作,如下所示 -
spark.sql(""" MERGE INTO <delta table name> deltatbl USING <temp view> source
ON deltatbl.col1 = source.col1
AND deltatbl.col2 = source.col2
WHEN NOT MATCHED THEN INSERT
(col1,col2) VALUES(source.Col1,source.Col2) """)
尽管匹配唯一键,上述查询仍插入重复记录。如何实现仅插入不匹配记录的输出。所有列都是键的一部分。
最佳答案
如果您想更新现有记录:
MERGE INTO events
USING updates
ON events.eventId = updates.eventId
WHEN MATCHED THEN
UPDATE SET events.data = updates.data
WHEN NOT MATCHED
THEN INSERT (date, eventId, data) VALUES (date, eventId, data)
如果只想为不存在的记录插入:使用相同的值更新
MERGE INTO events
USING updates
ON events.eventId = updates.eventId
WHEN MATCHED THEN
UPDATE SET events.data = events.data
WHEN NOT MATCHED
THEN INSERT (date, eventId, data) VALUES (date, eventId, data)
你的情况,
MERGE INTO <delta table name> deltatbl USING <temp view> source
ON deltatbl.col1 = source.col1
AND deltatbl.col2 = source.col2
WHEN MATCHED THEN
UPDATE SET deltatbl.data = deltatbl.data
WHEN NOT MATCHED THEN INSERT
(col1,col2) VALUES(source.Col1,source.Col2)
关于apache-spark-sql - 尽管键与 "WHEN NOT MATCHED THEN INSERT"子句匹配,但 databricks 增量表合并正在插入记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69562007/