0

I am trying to calculate correlation for all columns in a Spark dataframe using the below code.

import org.apache.spark.ml.linalg.{Matrix, Vectors}
import org.apache.spark.ml.stat.Correlation
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.apache.spark.ml.feature.VectorAssembler

val spark = SparkSession
  .builder
  .appName("SparkCorrelation")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

val df = Seq(
  (0.1, 0.3, 0.5),
  (0.2, 0.4, 0.6),
).toDF("c1", "c2", "c3")

val assembler = new VectorAssembler()
  .setInputCols(Array("c1", "c2", "c3"))
  .setOutputCol("vectors")

val transformed = assembler.transform(df)

val corr = Correlation.corr(transformed, "vectors","pearson")

corr.show(100,false)

My output comes out as a dataframe with one column.

pearson(vectors)
1.0 1.0000000000000002 0.9999999999999998 \n1.0000000000000002 1.0 1.0000000000000002 \n0.9999999999999998 1.0000000000000002 1.0

but I want my output in the following format. Can somebody please help?

Column c1 c2 c3
c1 1 0.97 0.92
c2 0.97 1 0.94
c3 0.92 0.94 1

1 Answer 1

0

Best you can do is this, but without cols:

val corr = Correlation.corr(transformed, "vectors", "pearson").head
println(s"Pearson correlation matrix:\n $corr")
Sign up to request clarification or add additional context in comments.

4 Comments

Is there a way to add columns after generating the above dataframe?
I think not. The idea is that visual sighting should be enough as normally cardinality of cols c is low. Other tools do that standardly.
The above dataframe was just an example. my original datasets will have 300 cols or more. So, I would be needing the output as in the desired dataframe format I mentioned earlier. Then, I will using the dataframe for further downstream processing.
All things can be done but I have never seen that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.