0

I am working on a Pyspark using the flatMap function and I am using the split within the function. But I am getting an error which says: AttributeError: 'NoneType' object has no attribute 'split''

I am watching a video and replicating the same thing I am seeing in the video. It works in the video but I keep getting this error. Below is my code:

 datasetfor2019.map(lambda col: col[Conditions])\
.filter(lambda x: x!='')\
.flatMap(lambda x: x.split(','))\
.map(lambda x: (x, 1))\
.reduceByKey(add)\
.sortBy(lambda x: x[1], ascending=False)\
.take(5)

I will like to know what I am doing wrong or if I need to import anything into my Pyspark environment, what could that be?

Thanking you in advance.

8
  • Are some values of your column null? The error seems to point in that direction Commented May 6, 2022 at 12:44
  • you can easily debug the problem by print the result (e.g. rdd.take(5)) after each step. It is difficult to judge in which step the problem arises, as no description to the input data is given. Commented May 6, 2022 at 14:10
  • I seem not to know where the null is and I have tried to flag it in my condition but its not seem to be working out. Commented May 6, 2022 at 14:57
  • @XYZ I will check that and see how this works out Commented May 6, 2022 at 14:58
  • .filter(lambda x: x!='')\ wont filter null values... Commented May 6, 2022 at 15:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.