1

I have the following script to run a SQL query:

val df_joined_sales_partyid = spark.sql(""" SELECT a.sales_transaction_id, b.customer_party_id, a.sales_tran_dt
                                            FROM df_sales_tran a 
                                            JOIN  df_sales_tran_party_xref b
                                            ON a.sales_transaction_id = b.sales_transaction_id
                                            Limit 3""")

I want to know how I can save the output of this query as a permanent data-frame table. I noticed that every time that I run display(df_joined_sales_partyid), it seems to run the query again. How do I avoid running the query multiple times and save the results to a data-frame table. I am new to writing Scala so forgive me if this is an easy question, but I couldn't find a solution online.

1 Answer 1

1
// caches results in memory
df_joined_sales_partyid.cache() 

// or

// memory and disk, see https://spark.apache.org/docs/2.4.0/api/java/index.html?org/apache/spark/storage/StorageLevel.html for other possible values
df_joined_sales_partyid.persist(StorageLevel.MEMORY_AND_DISK) 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.