PySpark Dataframe transform columns into rows

Question

I have the following Dataframe

REC_DATA = spark.createDataFrame(
    [
      ('exercise', 'fiber', 'rice', 'male'),
      ('exercise', 'rice', 'fiber', 'female'),
      ('exercise', 'water', 'fiber', 'male'),
      ('exercise', 'rice', 'exercise', 'female'),
    ], 
      StructType(
        [
            StructField("1_rec", StringType(), False),
            StructField("2_rec", StringType(), False),
            StructField("3_rec", StringType(), False),
            StructField("sex", StringType(), True),
        ]
    )
)

1_rec	2_rec	3_rec	sex
exercise	fiber	rice	male
exercise	rice	fiber	female
exercise	water	fiber	male
water	rice	exercise	female

And I'm trying to group these rows into a new column, transforming the columns 1_rec, 2_rec, 3_rec into rows, and add a new column with the quantity, the output should be like that:

Position	name	count
1_rec	exercise	3
1_rec	water	1
2_rec	water	1
2_rec	rice	2
2_rec	fiber	1
3_rec	rice	1
3_rec	fiber	2
3_rec	exercise	1

I had tried to do a Crosstab but it didn't work properly.

AdibP · Accepted Answer · 2021-07-23 04:51:09Z

3

Use stack to kind of unpivot columns then group them by position and name

import pyspark.sql.functions as F

REC_DATA = (REC_DATA
            .selectExpr("stack(3, '1_rec', 1_rec, '2_rec', 2_rec, '3_rec', 3_rec) (position, name)")
            .groupBy('position', 'name')
            .agg(F.count("*").alias('count')))
REC_DATA.show()

+--------+--------+-----+
|position|    name|count|
+--------+--------+-----+
|   1_rec|   water|    1|
|   2_rec|    rice|    2|
|   3_rec|exercise|    1|
|   3_rec|   fiber|    2|
|   2_rec|   water|    1|
|   3_rec|    rice|    1|
|   1_rec|exercise|    3|
|   2_rec|   fiber|    1|
+--------+--------+-----+

answered Jul 23, 2021 at 4:51

AdibP

2,9691 gold badge13 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

PySpark Dataframe transform columns into rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related