I have a Spark dataframe that looks as follows:
+-----------+-------------------+
| ID | features |
+-----------+-------------------+
| 18156431|(5,[0,1,4],[1,1,1])|
| 20260831|(5,[0,4,5],[2,1,1])|
| 91859831|(5,[0,1],[1,3]) |
| 206186631|(5,[3,4,5],[1,5]) |
| 223134831|(5,[2,3,5],[1,1,1])|
+-----------+-------------------+
In this dataframe the features column is a sparse vector. In my scripts I have to save this DF as file on disk. When doing this, the features column is saved as as text column: example "(5,[0,1,4],[1,1,1])".
When importing again in Spark the column stays string, as you could expect. How can I convert the column back to (sparse) vector format?
ML/MLib) ? How do you read this data?DF = sqlContext.read.format('com.databricks.spark.csv').options(header='true', inferschema='true', delimiter=delimiter).load('file://'+path).drop('')