如何在pyspark join中选择*
impression_rdd.join(
click_rdd,
impression_rdd.session_id == click_rdd.session_id,
"left_outer"
).select(impression_rdd.*) <------- pseudo code; how do you do this?
基本上,sql 等价物
SELECT impression.* FROM impression LEFT JOIN click on (impression.session_id = click.session_id)
最佳答案
您可以简单地在您的伪代码中添加别名和几个引号:
(impressions.alias("impressions")
.join(clicks, ["id"], "left_outer")
.select("impressions.*"))
关于python - PySpark:如何在 rdd join 期间从左表中选择 *,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38034089/