0

I want to convert my results1 numpy array to a dataframe. For the record, results1 looks like

array([(1.0, 0.1738578587770462), (1.0, 0.33307021689414978),
       (1.0, 0.21377330869436264), (1.0, 0.443511435389518738),
       (1.0, 0.3278091162443161), (1.0, 0.041347454154491425)]).

I want to convert the above to a pyspark RDD with columns labeled "limit" (the first value in the tuple) and "probability" (the second value in the tuple).

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('YKP').getOrCreate()
sc=spark.sparkContext
# Convert list to RDD
rdd = sc.parallelize(results1)

# Create data frame
df = sc.createDataFrame(rdd)

I keep getting the error

AttributeError: 'RemoteContext' object has no attribute 'createDataFrame'

when I run this. I don't see why this is giving me an error and how do I fix this?

1
  • 1
    createDataFrame is a part of sqlContext. sqlContext = SQLContext(sc) Commented Oct 29, 2020 at 18:08

2 Answers 2

1

Use map() and toDF() instead.

import numpy as np

results1 = np.array([(1.0, 0.1738578587770462), (1.0, 0.33307021689414978),
       (1.0, 0.21377330869436264), (1.0, 0.443511435389518738),
       (1.0, 0.3278091162443161), (1.0, 0.041347454154491425)])

df = sc.parallelize(results1).map(lambda x: [float(i) for i in x])\
        .toDF(["limit", "probability"])

df.show()
+-----+--------------------+                                                    
|limit|         probability|
+-----+--------------------+
|  1.0|  0.1738578587770462|
|  1.0|  0.3330702168941498|
|  1.0| 0.21377330869436265|
|  1.0| 0.44351143538951876|
|  1.0|  0.3278091162443161|
|  1.0|0.041347454154491425|
+-----+--------------------+
Sign up to request clarification or add additional context in comments.

Comments

0

The simplest way is:

df = rdd.map(lambda x: (x, )).toDF()
df.show()

You can also refer to this post for more details: Create Spark DataFrame. Can not infer schema for type: <type 'float'>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.