0

I have two dataframes in scala, i created them using the sql queries via hive context, please see the df as images here
demand_df

Another dataframe is as
skill_map

Please ignore the repetition of headers in second df, i want to compare the Skill column present in both the data frames and get the equivalent Role, Skill2 and emerging in df1 i.e. demand_df,

I tried this in pandas and was able to achieve by using the following code segment

df1 = pd.DataFrame([
    ["INDIA", "XXX","developer","UNKNOWN",121],
    ["INDIA", "XXXX","software engineer","UNKNOWN",121],
    ["POLAND","XX","english","KNOWN",12‌​2]], 
    columns=['country', 'level','Skill','r2d2','tax']) 

df2 = pd.DataFrame([
    ["english", "NaN","teacher","NaN","NaN"],
    [20000,"Unknown","NaN","NaN","N‌​aN"],
    ["microsoft","K‌​nown","Software Engineer","Microsoft","Enterprise"]], 
    columns=['Skill', 'R2D2','Role','Skill2','Emerging'])

result= df1.merge(df2[['Skill','Role','Skill2','emerging']], how = 'left', left_on = 'Skill', right_on = 'Skill')  

Please guide me,as I am new to scala

1 Answer 1

1

Since you have created two dataframes and want to join two on the basis of skill and create a new dataframe which has df1 and Role, Skill2 and emerging from df2. You can do it through sqlcontext. val sqlContext = new org.apache.spark.sql.SQLContext(sc)

register both dataframe as temptable with the command :

df1.registerTempTable("df1")

df2.registerTempTable("df2")

after that you use simple hive query to join and fetch the required columns from data frame :

val df3 = sqlContext.sql("select a.,b. from df1 a left join df2 b on (a.skill = b.skill)")

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @kishan singh , this served my purpose.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.