0

I have the following Dataframe

REC_DATA = spark.createDataFrame(
    [
      ('exercise', 'fiber', 'rice', 'male'),
      ('exercise', 'rice', 'fiber', 'female'),
      ('exercise', 'water', 'fiber', 'male'),
      ('exercise', 'rice', 'exercise', 'female'),
    ], 
      StructType(
        [
            StructField("1_rec", StringType(), False),
            StructField("2_rec", StringType(), False),
            StructField("3_rec", StringType(), False),
            StructField("sex", StringType(), True),
        ]
    )
)
1_rec 2_rec 3_rec sex
exercise fiber rice male
exercise rice fiber female
exercise water fiber male
water rice exercise female

And I'm trying to group these rows into a new column, transforming the columns 1_rec, 2_rec, 3_rec into rows, and add a new column with the quantity, the output should be like that:

Position name count
1_rec exercise 3
1_rec water 1
2_rec water 1
2_rec rice 2
2_rec fiber 1
3_rec rice 1
3_rec fiber 2
3_rec exercise 1

I had tried to do a Crosstab but it didn't work properly.

1 Answer 1

3

Use stack to kind of unpivot columns then group them by position and name

import pyspark.sql.functions as F

REC_DATA = (REC_DATA
            .selectExpr("stack(3, '1_rec', 1_rec, '2_rec', 2_rec, '3_rec', 3_rec) (position, name)")
            .groupBy('position', 'name')
            .agg(F.count("*").alias('count')))
REC_DATA.show()

+--------+--------+-----+
|position|    name|count|
+--------+--------+-----+
|   1_rec|   water|    1|
|   2_rec|    rice|    2|
|   3_rec|exercise|    1|
|   3_rec|   fiber|    2|
|   2_rec|   water|    1|
|   3_rec|    rice|    1|
|   1_rec|exercise|    3|
|   2_rec|   fiber|    1|
+--------+--------+-----+

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.