Pyspark Dataframe to 3d Numpy Matrix

Question

My input spark dataframe is;

Client  Feature1    Feature2   
1       10          1
1       15          3
1       20          5
1       25          7
1       30          9
2       1           10
2       2           11
2       3           12
2       4           13
2       5           14
3       100         0
3       150         1
3       200         2
3       250         3
3       300         4

I want to convert pyspark dataframe to 3d numpy matrix for each client. I shared the desired output according to the data above ;

   [[[10, 1],
     [15, 3],
     [20, 5],
     [25, 7],
     [30, 9]],
    [[1, 10],
     [2, 11],
     [3, 12],
     [4, 13],
     [5, 14]],   
    [[100, 0],
     [150, 1],
     [200, 2],
     [250, 3],
     [300, 4]]]

Could you please help me about this?

mck · Accepted Answer · 2021-01-28 14:58:46Z

2

You can do a collect_list aggregation before collecting the dataframe to Python and converting the result to a Numpy array:

import numpy as np
import pyspark.sql.functions as F

a = np.array([
    i[1] for i in 
    df.groupBy('Client')
      .agg(F.collect_list(F.array(*df.columns[1:])))
      .orderBy('Client')
      .collect()
])

print(a)
array([[[ 10,   1],
        [ 15,   3],
        [ 20,   5],
        [ 25,   7],
        [ 30,   9]],

       [[  1,  10],
        [  2,  11],
        [  3,  12],
        [  4,  13],
        [  5,  14]],

       [[100,   0],
        [150,   1],
        [200,   2],
        [250,   3],
        [300,   4]]])

edited Jan 28, 2021 at 14:58

answered Jan 28, 2021 at 14:49

mck

42.7k13 gold badges44 silver badges62 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Salih Over a year ago

Hey @mck, if there are too many features, should I write them all one by one?

mck Over a year ago

@Salih you can use *df.columns[1:] as in the edited answer.

Collectives™ on Stack Overflow

Pyspark Dataframe to 3d Numpy Matrix

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related