1

I have the following json:

{"metadata"=>{"result_type"=>"recent", "iso_language_code"=>"en"},
 "created_at"=>"Thu Feb 28 10:45:15 +0000 2013",
 "id"=>307079006698745857,
 "id_str"=>"307079006698745857",
 "text"=>
  "@borkdude @Rebel_Labs there are 7500+ people on the mailing list, too: http://t.co/pswvhvqJPE",
 "source"=>
  "<a href=\"http://tapbots.com/software/tweetbot/mac\" rel=\"nofollow\">Tweetbot for Mac</a>",
 "truncated"=>false,
 "in_reply_to_status_id"=>307049603952414720,
 "in_reply_to_status_id_str"=>"307049603952414720",
 "in_reply_to_user_id"=>15446348,
 "in_reply_to_user_id_str"=>"15446348",
 "in_reply_to_screen_name"=>"borkdude",
 "user"=>
  {"id"=>13033522,
   "id_str"=>"13033522",
   "name"=>"Michael Klishin",
   "screen_name"=>"michaelklishin",
   "location"=>"",
   "description"=>
    "Multilingual. Curious about how things work. Software, concurrency, OSS. Data, urbanism. Trance, dubstep, lolgifs. @ClojureWerkz mastermind, ex-@travisci core.",
   "url"=>"http://bit.ly/nTTvfC",
   "entities"=>
    {"url"=>
      {"urls"=>
        [{"url"=>"http://bit.ly/nTTvfC",
          "expanded_url"=>nil,
          "indices"=>[0, 20]}]},
     "description"=>{"urls"=>[]}},
   "protected"=>false,
   "followers_count"=>805,
   "friends_count"=>215,
   "listed_count"=>39,
   "created_at"=>"Mon Feb 04 04:11:13 +0000 2008",
   "favourites_count"=>61,
   "utc_offset"=>14400,
   "time_zone"=>"Moscow",
   "geo_enabled"=>false,
   "verified"=>false,
   "statuses_count"=>5833,
   "lang"=>"es",
   "contributors_enabled"=>false,
   "is_translator"=>false,
   "profile_background_color"=>"C0DEED",
   "profile_background_image_url"=>
    "http://a0.twimg.com/images/themes/theme1/bg.png",
   "profile_background_image_url_https"=>
    "https://si0.twimg.com/images/themes/theme1/bg.png",
   "profile_background_tile"=>false,
   "profile_image_url"=>
    "http://a0.twimg.com/profile_images/3190382095/8485cc3e3534ffd2eef41854204d34e4_normal.jpeg",
   "profile_image_url_https"=>
    "https://si0.twimg.com/profile_images/3190382095/8485cc3e3534ffd2eef41854204d34e4_normal.jpeg",
   "profile_link_color"=>"0084B4",
   "profile_sidebar_border_color"=>"C0DEED",
   "profile_sidebar_fill_color"=>"DDEEF6",
   "profile_text_color"=>"333333",
   "profile_use_background_image"=>true,
   "default_profile"=>true,
   "default_profile_image"=>false,
   "following"=>nil,
   "follow_request_sent"=>nil,
   "notifications"=>nil},
 "geo"=>nil,
 "coordinates"=>nil,
 "place"=>nil,
 "contributors"=>nil,
 "retweet_count"=>0,
 "entities"=>
  {"hashtags"=>[],
   "urls"=>
    [{"url"=>"http://t.co/pswvhvqJPE",
      "expanded_url"=>"http://groups.google.com/group/clojure",
      "display_url"=>"groups.google.com/group/clojure",
      "indices"=>[71, 93]}],
   "user_mentions"=>
    [{"screen_name"=>"borkdude",
      "name"=>"Michiel Borkent",
      "id"=>15446348,
      "id_str"=>"15446348",
      "indices"=>[0, 9]},
     {"screen_name"=>"Rebel_Labs",
      "name"=>"Rebel Labs",
      "id"=>904047793,
      "id_str"=>"904047793",
      "indices"=>[10, 21]}]},
 "favorited"=>false,
 "retweeted"=>false,
 "possibly_sensitive"=>false}

This is stored in a postgres table created with:

create table tweets ( id bigint, tweet json, constraint id primary key(id) );

What is most efficient way to find all entries that have an object in tweet->'entities'->'user_mentions' that has a 'screen_name' == 'SOME_VALUE'.

1 Answer 1

2

I found some inspiration in Index for finding an element in a JSON array.

What you need to do is:

  • Create an immutable function to generate your GIN index:

    mf=# CREATE OR REPLACE FUNCTION json_val_arr(_j json, _key text) mf-# RETURNS text[] AS mf-# $$ mf$# SELECT array_agg(elem->>_key) mf$# FROM json_array_elements(_j) AS x(elem) mf$# $$ mf-# LANGUAGE sql IMMUTABLE; CREATE FUNCTION

  • Create a GIN index using the function:

    mf=# CREATE INDEX entities_user_mentions_screen_name ON "1".tweets USING GIN (json_val_arr(tweet->'entities'->'user_mentions', 'screen_name'));

  • Query:

    mf=# select id from "1".tweets where '{"Rebel_Labs"}'::text[] <@ (json_val_arr(tweet->'entities'->'user_mentions', 'screen_name'));

    id

    307079006698745857 307063068662321152 307049603952414720 306869345110351872 306436498360774656 308672668985593856 308645862236643328 309979789794619392 (8 rows)

    Time: 8,356 ms

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.