3

I'm using pyspark and imported a hive table into a dataframe.

df = sqlContext.sql("from hive_table select *") 

I need help on converting this df to numpy array. You may assume hive_table has only one column.

Can you please suggest? Thank you in advance.

0

1 Answer 1

4

You can:

sqlContext.range(0, 10).toPandas().values  # .reshape(-1) for 1d array
array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

but it is unlikely you really want to. Created array will be local to the driver node so it its rarely useful. If you're looking for some variant of distributed array-like data structure there is a number of possible choices in Apache Spark:

and independent of Apache Spark:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.