0

I am trying to take an rdd that looks like:

[<1x24000 sparse matrix of type '' with 10 stored elements in Compressed Sparse Row format>, . . . ]

and ideally turn it into a dataframe that looks like:

<code>
   +-----------------+
   |  A  |  B  |   C |
   +-----------------+
   | 1.0 | 0.0 | 0.0 |
   +-----+-----+-----+
   | 1.0 | 1.0 | 0.0 |
   +-----+-----+-----+
</code>

However, I keep getting this:

<code>
+---------------+
|             _1|
+---------------+
|[1.0, 0.0, 0.0]|
+---------------+
|[1.0, 1.0, 0.0]|
+---------------+
</code>

I am having the darnedest time because each row is filled with numpy arrays.

I used this code to create the dataframe from the rdd:

<code>res.flatMap(lambda x: np.array(x.todense())).map(list).map(lambda l : Row([float(x) for x in l])).toDF()</code>

**Explode does not help (it puts everything into the same column)

** I tried using a UDF on the resulting dataframe but I cannot seem to separate the numpy array into individual values.

Please help!

1 Answer 1

1

Try:

.map(lambda l : Row(*[float(x) for x in l]))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.