Saving Scala SQL Output as DataFrame

Question

I have the following script to run a SQL query:

val df_joined_sales_partyid = spark.sql(""" SELECT a.sales_transaction_id, b.customer_party_id, a.sales_tran_dt
                                            FROM df_sales_tran a 
                                            JOIN  df_sales_tran_party_xref b
                                            ON a.sales_transaction_id = b.sales_transaction_id
                                            Limit 3""")

I want to know how I can save the output of this query as a permanent data-frame table. I noticed that every time that I run display(df_joined_sales_partyid), it seems to run the query again. How do I avoid running the query multiple times and save the results to a data-frame table. I am new to writing Scala so forgive me if this is an easy question, but I couldn't find a solution online.

Denis Makarenko · Accepted Answer · 2019-04-18 20:51:35Z

1

// caches results in memory
df_joined_sales_partyid.cache() 

// or

// memory and disk, see https://spark.apache.org/docs/2.4.0/api/java/index.html?org/apache/spark/storage/StorageLevel.html for other possible values
df_joined_sales_partyid.persist(StorageLevel.MEMORY_AND_DISK)

answered Apr 18, 2019 at 20:51

Denis Makarenko

2,93819 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Saving Scala SQL Output as DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related