通过从配置单元表中读取,我在 pyspark 中有一个 DataFrame(df):
df=spark.sql('select * from <table_name>')
+++++++++++++++++++++++++++++++++++++++++++
| Name | URL visited |
+++++++++++++++++++++++++++++++++++++++++++
| person1 | [google,msn,yahoo] |
| person2 | [fb.com,airbnb,wired.com] |
| person3 | [fb.com,google.com] |
+++++++++++++++++++++++++++++++++++++++++++
当我尝试以下操作时,出现错误
df_dict = dict(zip(df['name'],df['url']))
"TypeError: zip argument #1 must support iteration."
type(df.name) is of 'pyspark.sql.column.Column'
我如何创建一个像下面这样的字典,以后可以迭代
{'person1':'google','msn','yahoo'}
{'person2':'fb.com','airbnb','wired.com'}
{'person3':'fb.com','google.com'}
感谢您的想法和帮助。
最佳答案
我觉得你可以试试row.asDict()
,这段代码直接运行在executor上,不需要在driver上收集数据。
就像是:
df.rdd.map(lambda row: row.asDict())
关于apache-spark - 如何在pyspark中将行转换为字典列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49432167/