I have the following Dataframe
REC_DATA = spark.createDataFrame(
[
('exercise', 'fiber', 'rice', 'male'),
('exercise', 'rice', 'fiber', 'female'),
('exercise', 'water', 'fiber', 'male'),
('exercise', 'rice', 'exercise', 'female'),
],
StructType(
[
StructField("1_rec", StringType(), False),
StructField("2_rec", StringType(), False),
StructField("3_rec", StringType(), False),
StructField("sex", StringType(), True),
]
)
)
| 1_rec | 2_rec | 3_rec | sex |
|---|---|---|---|
| exercise | fiber | rice | male |
| exercise | rice | fiber | female |
| exercise | water | fiber | male |
| water | rice | exercise | female |
And I'm trying to group these rows into a new column, transforming the columns 1_rec, 2_rec, 3_rec into rows, and add a new column with the quantity, the output should be like that:
| Position | name | count |
|---|---|---|
| 1_rec | exercise | 3 |
| 1_rec | water | 1 |
| 2_rec | water | 1 |
| 2_rec | rice | 2 |
| 2_rec | fiber | 1 |
| 3_rec | rice | 1 |
| 3_rec | fiber | 2 |
| 3_rec | exercise | 1 |
I had tried to do a Crosstab but it didn't work properly.