I am trying to do fast enrichment in Spark with conditional queries.
I have two data sets of key/val: "Event Data" and "Session Map". The "session map" is used to find out who was using a given IP between two timestamps. The "Event data" is a large collection of events, with an IP and a timestamp, which need to be correlated against the "session map" to enrich with username.
Is there an efficient way to enrich the Event Data against Session Map in Spark, or something else?
Session map:
(IP, start_time, end_time) -> Name
(192.168.0.l, 2016-01-01 10:00:00, 2016-01-01 22:00:00) -> John
(192.168.0.l, 2016-01-01 22:00:01, 2016-01-02 04:35:00) -> Dana
(10.0.0.12, 2016-01-02 06:00:13, 2016-01-02 09:23:24) -> John
...
Event data:
IP -> timestamp
192.168.0.l, 2016-01-01 10:00:00
192.168.0.l, 2016-01-01 10:00:01
192.168.0.l, 2016-01-01 10:00:02
192.168.0.l, 2016-01-01 10:05:23
...
192.168.0.l, 2016-01-01 22:00:01
192.168.0.l, 2016-01-01 22:12:35
192.168.0.l, 2016-01-01 04:12:00
...