我需要在pyspark中执行postgressql 我尝试了以下操作:-
spark = SparkSession.builder.appName("ReadFromPostgreSQL").getOrCreate()
url = "jdbc:postgresql://localhost:5432/database_example"
properties = {"user": "postgres", "password": "1234", "driver": "org.postgresql.Driver"}
query = "SELECT * FROM arpan.check_master_planning_family"
jdbcDF = spark.read.jdbc(url=url, table=query, properties=properties)
这里我用我的服务器 URL、用户名和密码替换了用户名、密码和 url。
但是我收到了这个错误:-
Py4JJavaError:-org.postgresql.util.PSQLException: ERROR: syntax error at or near "SELECT"
这个查询在 postgres 中完美运行,但在这里我遇到了错误。请解决这个问题
最佳答案
更改 table=query
至table="<table_name>"
spark = SparkSession.builder.appName("ReadFromPostgreSQL").getOrCreate()
url = "jdbc:postgresql://localhost:5432/database_example"
properties = {"user": "postgres", "password": "1234", "driver": "org.postgresql.Driver"}
employee_df = spark.read.jdbc(url=url, table="employee", properties=properties)
department_df = spark.read.jdbc(url=url, table="department", properties=properties)
<强>1。使用数据框
employee_df.join(department_df, employee_df.dept_id == department_df.id, "inner").show()
<强>2。使用 Spark SQL
从此 DataFrame 创建临时 View
employee_df.createOrReplaceTempView("employee")
department_df.createOrReplaceTempView("department")
现在使用spark.sql
以 SQL 格式编写代码:
spark.sql('''
SELECT * FROM employee AS emp
JOIN department AS dept
ON emp.dept_id = dept.id
''').show()
关于postgresql - Azure Databrick 平台上的 pyspark 中的 Py4JJava 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75955891/