I have a spark data frame that looks like the one from below: Each row contains one list, from which I want to extract an element.
I am pretty green at spark, so I convert it into a pandas DataFrame, and then with map functions, I extract the desired elements. The problem is, that the data is huge and therefore this approach is not scalable. What is taking me time is the toPandas() command.
Is there an option to access the values within the list from each row?
Thanks!
+--------------------+
| sentence|
+--------------------+
|[{document, 0, 23...|
|[{document, 0, 68...|
|[{document, 0, 65...|
|[{document, 0, 23...|
|[{document, 0, 23...|
|[{document, 0, 23...|
+--------------------+
getItem(spark.apache.org/docs/latest/api/python/reference/api/…) to get element from list based on index. If you are looking for more specialized case, share the logic you apply after converting to pandas.