1

I've data stored in MongoDB collection and the timestamp column is not being read by Apache Spark correctly. I'm running Apache Spark on GCP Dataproc.

Here is sample data :

In Mongo :

timeslot_date  : 
timeslot  |timeslot_date         |
+--------------------------+------
1683527400|{2023-05-08T06:30:00Z}|


When I use pyspark to read this (selected only specific columns :

+----------+-------------------+
timeslot  |timeslot_date      |
+----------+-------------------+
1683527400|2023-05-07 23:30:00|
+----------------+-------+-----
 

My understanding is, data in Mongo is in UTC format i.e. 2023-05-08T06:30:00Z is in UTC format. I'm in PST timezone. I'm not clear why spark is reading it a different timezone format (neither PST nor UTC) Note - it is not reading it as PST timezone, if it was doing that it would advance the time by 7 hours, instead it is doing the opposite.

Where is the default timezone format taken from, when Spark is reading data from MongoDB ?

Any ideas on this ?

tia!

1
  • Your time zone is PDT, not PST. The returned time is 23:30:00 PDT which is 06:30:00 GMT. Commented Jun 8, 2023 at 23:15

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.