1

I have here an example DF:

+---+---------+----+---------+---------+-------------+-------------+
|id | company |type|rev2016  | rev2017 | main2016    |  main2017   |
+---+---------+----+---------+---------+-------------+-------------+
| 1 | google  |web | 100     |  200    |  55         |     66      |
+---+---------+----+---------+---------+-------------+-------------+

And I want this output:

+---+---------+----+-------------+------+------+
|id | company |type| Metric      | 2016 | 2017 |   
+---+---------+----+-------------+------+------+
| 1 | google  |web | rev         | 100  |  200 |   
| 1 | google  |web | main        |  55  |  66  |  
+---+---------+----+-------------+------+------+

What I am trying to achieve is transposing revenue and maintenance columns to rows with new column 'Metric'. I am trying pivoting no luck so far.

0

1 Answer 1

1

You can construct an array of structs from the columns, and then explode the arrays and expand the structs to get the desired output.

import pyspark.sql.functions as F

struct_list = [
    F.struct(
        F.lit('rev').alias('Metric'),
        F.col('rev2016').alias('2016'),
        F.col('rev2017').alias('2017')
    ),
    F.struct(
        F.lit('main').alias('Metric'),
        F.col('main2016').alias('2016'),
        F.col('main2017').alias('2017')
    )
]

df2 = df.withColumn(
    'arr',
    F.explode(F.array(*struct_list))
).select('id', 'company', 'type', 'arr.*')

df2.show()
+---+-------+----+------+----+----+
| id|company|type|Metric|2016|2017|
+---+-------+----+------+----+----+
|  1| google| web|   rev| 100| 200|
|  1| google| web|  main|  55|  66|
+---+-------+----+------+----+----+

Or you can use stack:

df2 = df.selectExpr(
    'id', 'company', 'type',
    "stack(2, 'rev', rev2016, rev2017, 'main', main2016, main2017) as (Metric, `2016`, `2017`)"
)

df2.show()
+---+-------+----+------+----+----+
| id|company|type|Metric|2016|2017|
+---+-------+----+------+----+----+
|  1| google| web|   rev| 100| 200|
|  1| google| web|  main|  55|  66|
+---+-------+----+------+----+----+
Sign up to request clarification or add additional context in comments.

2 Comments

Thank for this, can i insert a for loop for multiple F.structs?
@JohnCarloVelasquez you can use a for loop to construct a list of structs, and pass them to F.array , e.g. F.array(*struct_list).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.