java - 加入数据框 spark java

标签 java apache-spark dataframe spark-dataframe

First of all, thank you for the time in reading my question.

My question is the following: In Spark with Java, i load in two dataframe the data of two csv files.

These dataframes will have the following information.

Dataframe Airport

Id | Name    | City
-----------------------
1  | Barajas | Madrid

Dataframe airport_city_state

City | state
----------------
Madrid | España

I want to join these two dataframes so that it looks like this:

dataframe result

Id | Name    | City   | state
--------------------------
1  | Barajas | Madrid | España

Where dfairport.city = dfaiport_city_state.city

But I can not clarify with the syntax so I can do the join correctly. A little code of how I have created the variables:

 // Load the csv, you have to specify that you have header and what delimiter you have
Dataset <Row> dfairport = Load.Csv (sqlContext, data_airport);
Dataset <Row> dfairport_city_state = Load.Csv (sqlContext,   data_airport_city_state);


// Change the name of the columns in the csv dataframe to match the columns in the database
// Once they match the name we can insert them
Dfairport
.withColumnRenamed ("leg_key", "id")
.withColumnRenamed ("leg_name", "name")
.withColumnRenamed ("leg_city", "city")

dfairport_city_state
.withColumnRenamed("city", "ciudad")
.withColumnRenamed("state", "estado");

最佳答案

首先,非常感谢您的回复。

我已经尝试了我的两种解决方案,但都没有用,我收到以下错误: ETL_Airport 类型未定义方法 dfairport_city_state (String)

我无法访问要连接的数据框的特定列。

编辑: 已经完成加入,我把解决方案放在这里以防其他人提供帮助;)

感谢所有的一切和最诚挚的问候

//Join de tablas en las que comparten ciudad
Dataset <Row> joined = dfairport.join(dfairport_city_state, dfairport.col("leg_city").equalTo(dfairport_city_state.col("city")));

关于java - 加入数据框 spark java,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43033835/

相关文章:

apache-spark - Kafka 和 Pyspark 集成

python / Pandas : Converting numbers by comma separated for thousands

python - 从多个数据帧获取 int 单元格的总和

python - 将 y 轴格式设置为万亿美元

java - idlj-maven-plugin 和包翻译

java - 构建缓存失败应该抛出哪个异常?

java - android.view.InflateException : Binary XML <Unknown>, 和 OutOfMemoryError?

apache-spark - Spark Streaming 创建许多小文件

apache-spark - 单个 JVM 中的多个 SparkSession

java - 从类路径读取特定资源