我是 pyspark 的新手,但对 pandas 很熟悉。 我有一个 pyspark 数据框
# instantiate Spark
spark = SparkSession.builder.getOrCreate()
# make some test data
columns = ['id', 'dogs', 'cats']
vals = [
(1, 2, 0),
(2, 0, 1)
]
# create DataFrame
df = spark.createDataFrame(vals, columns)
想要添加新行 (4,5,7) 以便输出:
df.show()
+---+----+----+
| id|dogs|cats|
+---+----+----+
| 1| 2| 0|
| 2| 0| 1|
| 4| 5| 7|
+---+----+----+
最佳答案
作为thebluephantom已经说过联合是要走的路。我只是回答你的问题,给你一个 pyspark 的例子:
# if not already created automatically, instantiate Sparkcontext
spark = SparkSession.builder.getOrCreate()
columns = ['id', 'dogs', 'cats']
vals = [(1, 2, 0), (2, 0, 1)]
df = spark.createDataFrame(vals, columns)
newRow = spark.createDataFrame([(4,5,7)], columns)
appended = df.union(newRow)
appended.show()
另请查看数据 block 常见问题解答:https://kb.databricks.com/data/append-a-row-to-rdd-or-dataframe.html
关于python - 将新行添加到 pyspark Dataframe,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52685609/