1

I am trying to extract web analytics data from a nested JSON schema using Excels Power Query function for data from the internet. The schema has the following structure:

// Contents of "~/sites"
{
  "id": 1852274,
  "url": "http://link-to-a-site.com",
  "pages": 10,
  "visits": 1356,
  "_links": {
    "site": {
      "href": "~/sites/1852274"
    }
  },
  (200 entries)
}

// Contents of "~/sites/1852274"
{
  "id": 1852274,
  "url": "http://link-to-a-site.com",
  "_links": {
    "analytics": {
      "overview": {
        "summary": {
          "href": "~/sites/1852274/analytics/overview/summary"
         },
        "groups": {
          "href": "~/sites/1852274/analytics/overview/groups"
        }
      },
      "behavior": {
        "visit_depth": {
          "href": "~/sites/1852274/analytics/behavior/visit_depth"
        },
        "visit_length": {
          "href": "~/sites/1852274/analytics/behavior/visit_length"
        }
      },
     (50 entries)
    }
  }

// Full contents of "~/sites/1852274/analytics/overview/summary
{
  "bounce_rate": 36.36,
  "new_visitors": 6,
  "page_views": 31,
  "returning_visitors": 5,
  "unique_visitors": 11,
  "visits": 11,
}

where the first schema provides a link to site-specific data. So my question is, how do I access the data referenced in the link from ~/sites?

There are too many entries to manually connect to the site it references.

7
  • What do you mean by "There are too many entries to manually connect to the site it references." - the responseText is too long for things like HTTP? Commented Apr 2, 2018 at 15:54
  • I just updated the original post, that should make it clearer. Each JSON item on the top level contains 200 entries, each of which contains 50 links to some relevant data that I would like to access. Commented Apr 2, 2018 at 16:00
  • I guess what confuses me is why not extract the required hrefs from the JSON and store that in a collection which you loop over to get the additional info via http? Commented Apr 2, 2018 at 16:03
  • That sounds clever. Could you elaborate on how this might be achieved? Commented Apr 2, 2018 at 16:06
  • 1
    Well if you can use the API to pull in the data you can either parse the string using Split for example to extract the hrefs or even use something like github.com/VBA-tools/VBA-JSON to work with the JSON object and against target the href parts. Then there are lots of example on SO of using http Get requests. Not sure about "~/sites..." - assuming you have shorthanded this? Commented Apr 2, 2018 at 16:37

1 Answer 1

1

To extract the data I added a second entry to the JSON file in order to demonstrate.

JSON:

{
"ids": [
{
  "id": 1852274,
  "url": "http://link-to-a-site.com",
  "pages": 10,
  "visits": 1356,
  "_links": {
    "site": {
      "href": "~/sites/1852274"
    }
  }
},
{
  "id": 1852274,
  "url": "http://link-to-a-site.com",
  "pages": 10,
  "visits": 1356,
  "_links": {
    "site": {
      "href": "~/sites/1852274"
    }
  }
},
]
}

The Power Query Code to convert it to a table containing the href as a column is as follows:

let
    Source = Json.Document(File.Contents("C:\Users\XXX\Desktop\test.json")),
    #"Converted to Table" = Record.ToTable(Source),
    #"Expanded Value" = Table.ExpandListColumn(#"Converted to Table", "Value"),
    #"Expanded Value1" = Table.ExpandRecordColumn(#"Expanded Value", "Value", {"id", "url", "pages", "visits", "_links"}, {"id", "url", "pages", "visits", "_links"}),
    #"Expanded _links" = Table.ExpandRecordColumn(#"Expanded Value1", "_links", {"site"}, {"site"}),
    #"Expanded site" = Table.ExpandRecordColumn(#"Expanded _links", "site", {"href"}, {"href"})
in
    #"Expanded site"

enter image description here

Sign up to request clarification or add additional context in comments.

9 Comments

Thanks for your comment, but this is not what I am after. I have updated the original post to make it more clear.
What is the output form you are aming for? Are you looking to join the two? e.g url | visits | new_visitors?
Yes, joining the data from the two sources into a single table.
Ok. So a asume you have two JSON files. Use the above procedure to transform both into a table. Then remove the "~/sites/" from the href column. Then use the Merge Query function to join the Detail data to the General (join on href = id). If you upload 2 example files, I can produce an example
Unfortunately, I simplified the example massively. There are 200+ entries on the first level, each related to one specific site. Each of these contain 50 links to a data table such as the one posted above, where most of the relevant data can be found. If such a structure is not supported by Power Query directly, I might have to write some VBA code to extract it instead.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.