13

I am trying to add a new row to dataframe but cant.

my code:

newRow = Row(id='ID123')
newDF= df.insertInto(newRow)
 or 
newDF= df.union(newRow)

errors:

AttributeError: _jdf

AttributeError: 'DataFrame' object has no attribute 'insertInto'
1
  • This might be something you are looking for. Try from pyspark.sql import Row, create a dictionary and then update the dictionary. stackoverflow.com/questions/39801691/… Commented Nov 29, 2017 at 15:34

3 Answers 3

20

Simple way to add row in dataframe using pyspark

newRow = spark.createDataFrame([(15,'Alk','Dhl')])
df = df.union(newRow)
df.show()
Sign up to request clarification or add additional context in comments.

Comments

-1

Try: (Documentation)

from pyspark.sql import Row
newDf = sc.parallelize([Row(id='ID123')]).toDF()
newDF.show()

4 Comments

it creating newDF rather than adding new
dataframes like RDD's are immutable and hence a new once is always created based on any action.
I'm confused. Where is the original df in this response? Not seeing how this answers the original question.
This is not a helpful answer. There is no indication that a dataFrame is being appended to. Alkesh Mahajan's answer is correct.
-6

Operation like is completely useless in practice. Spark DataFrame is a data structure designed for bulk analytical jobs. It is not intended for fine grained updates.

Although you can create single row DataFrame (as shown by i-n-n-m) and union it won't scale and won't truly distribute the data - Spark will have to keep local copy of the data, and execution plan will grow linearly with the number of inserted objects.

Please consider using proper database instead.

1 Comment

I needed it just for testing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.