我正在使用 Databricks,并希望使用 df.toPandas() 命令将我的 PySpark DataFrame 转换为 pandas
数据框。
但是,我不断收到此错误:
/databricks/spark/python/pyspark/sql/pandas/conversion.py:145: UserWarning: toPandas attempted Arrow optimization because 'spark.sql.execution.arrow.pyspark.enabled' is set to true, but has reached the error below and can not continue. Note that 'spark.sql.execution.arrow.pyspark.fallback.enabled' does not have an effect on failures in the middle of computation.
'DataFrame' object has no attribute 'dtype'
warnings.warn(msg)
AttributeError: 'DataFrame' object has no attribute 'dtype'
我尝试了不同的方法,包括:
spark.conf.set("spark.sql.execution.arrow.enabled", "false")
但到目前为止没有任何效果(我还检查了其他一些存在此问题的帖子,但没有任何帮助)。
更新:df.printSchema()
的结果:
flight_id: string (nullable = true)
|-- flight_direction: string (nullable = true)
|-- service_type: string (nullable = true)
|-- flight_designator: string (nullable = true)
|-- flight_number: string (nullable = true)
|-- callsign: string (nullable = true)
|-- scheduled_datetime: timestamp (nullable = true)
|-- connecting_flight_designator: string (nullable = true)
|-- airport_iata_codes: array (nullable = true)
| |-- element: string (containsNull = true)
|-- airline_name: string (nullable = true)
|-- airport_names: array (nullable = true)
| |-- element: string (containsNull = true)
|-- country_number: long (nullable = true)
|-- eu_category: string (nullable = true)
|-- safe_town_indicator: boolean (nullable = true)
|-- sibt: timestamp (nullable = true)
|-- aibt: timestamp (nullable = true)
|-- sobt: timestamp (nullable = true)
|-- aibt: timestamp (nullable = true)
|-- tsat: timestamp (nullable = true)
|-- aircraft_name: string (nullable = true)
|-- aircraft_registration: string (nullable = true)
|-- ramp: string (nullable = true)
|-- ramp_previous: string (nullable = true)
|-- seats: long (nullable = true)
|-- actual_total_pax: integer (nullable = true)
|-- handler_apron: string (nullable = true)
|-- occupancy_rate: double (nullable = false)
最佳答案
数据过滤出现问题。存在重复的列。如果以后有人遇到类似问题,请检查此。
关于python - Databricks 中的 PySpark 将表转换为 pandas 时出错,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75602965/