I am trying to add a new row to dataframe but cant.
my code:
newRow = Row(id='ID123')
newDF= df.insertInto(newRow)
or
newDF= df.union(newRow)
errors:
AttributeError: _jdf
AttributeError: 'DataFrame' object has no attribute 'insertInto'
Try: (Documentation)
from pyspark.sql import Row
newDf = sc.parallelize([Row(id='ID123')]).toDF()
newDF.show()
Operation like is completely useless in practice. Spark DataFrame is a data structure designed for bulk analytical jobs. It is not intended for fine grained updates.
Although you can create single row DataFrame (as shown by i-n-n-m) and union it won't scale and won't truly distribute the data - Spark will have to keep local copy of the data, and execution plan will grow linearly with the number of inserted objects.
Please consider using proper database instead.
from pyspark.sql import Row, create a dictionary and then update the dictionary. stackoverflow.com/questions/39801691/…