2

Currently, I am working to parse the quote table from data.cnbc.com/quotes/sdrl and put the innerhtml into a column next to a ticker that I specified.

enter image description here

So, I would grab the symbol from A2 then put the yield data into C2 then move to the next symbol.

The HTML looks like:

<table id="fundamentalsTableOne">
  <tbody>
    <tr scope="row">
        <th scope="row">EPS</th>
        <td>8.06</td>
    </tr>
    <tr scope="row">
        <th scope="row">Market Cap</th>
        <td>5.3B</td>
    </tr>
    <tr scope="row">
        <th scope="row">Shares Out</th>
        <td>492.8M</td>
    </tr>
    <tr scope="row">
        <th scope="row">Price/Earnings</th>
        <td>1.3x</td>
    </tr>
</tbody>
</table>
<table id="fundamentalsTableTwo">
  <tbody>
    <tr scope="row">
        <th scope="row">Revenue (TTM)</th>
        <td>5.0B</td>   
    </tr>
    <tr scope="row">
        <th scope="row">Beta</th>
        <td>1.84</td>
    </tr>
    <tr scope="row">
        <th scope="row">Dividend</th>
        <td>--</td>
    </tr>
    <tr scope="row">
        <th scope="row">Yield</th>
        <td><span class="pos">0.00%</span></td>
    </tr>
  </tbody>
</table>

Currently, I have:

Sub getInfoWeb()

Dim cell As Integer
Dim xhr As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim table As MSHTML.HTMLTable
Dim tableCells As MSHTML.IHTMLElementCollection

Set xhr = New MSXML2.XMLHTTP60

For cell = 2 To 5

ticker = Cells(cell, 1).Value

    With xhr

        .Open "GET", "http://data.cnbc.com/quotes/" & ticker, False
        .send

        If .readyState = 4 And .Status = 200 Then
            Set doc = New MSHTML.HTMLDocument
            doc.body.innerHTML = .responseText
        Else
            MsgBox "Error" & vbNewLine & "Ready state: " & .readyState & _
            vbNewLine & "HTTP request status: " & .Status
        End If

    End With

    Set table = doc.getElementById("fundamentalsTableOne")
    Set tableCells = table.getElementsByTagName("td")

    For Each tableCell In tableCells

            Cells(cell, 2).Value = tableCell.NextSibling.innerHTML

    Next tableCell

Next cell

End Sub

But, I am getting an "access is denied" error, as well as a runtime 91 at my set tablecells line. Is this because there is only one element in each row and the tablecells is set as a collection? Also, is the "access is denied" error due to the HTML generating from javascript? I wouldn't think that should be a problem.

If anyone knows how to get this working that'd be greatly appreciated. Thanks.

2
  • If the content is dynamically generated on the client-side then your approach isn't going to work, and you'd need to instead (eg) automate IE to load the page and read the content from there. Commented Apr 9, 2015 at 18:40
  • Thanks, Tim. I'll revise and try an browser route. Commented Apr 9, 2015 at 20:26

2 Answers 2

2

Here is an example showing how you can get the data you need:

GetData "sdrl"

Sub GetData(sSymbol)
    Dim sRespText, arrName, oDict, sResult, sItem
    XmlHttpRequest "GET", "http://data.cnbc.com/quotes/" & sSymbol, "", "", "", sRespText
    ParseToNestedArr "<span data-field=""name"">([\s\S]*?)</span>", sRespText, arrName
    XmlHttpRequest "GET", "http://apps.cnbc.com/company/quote/newindex.asp?symbol=" & sSymbol, "", "", "", sRespText
    ParseToDict "<tr[\s\S]*?><th[\s\S]*?>([\s\S]*?)</th><td>(?:<span[\s\S]*?>)*([\s\S]*?)(?:</span>)*</td></tr>", sRespText, oDict
    sResult = arrName(0)(0) & vbCrLf & vbCrLf
    For Each sItem in oDict.Keys
        sResult = sResult & sItem & " = " & oDict(sItem) & vbCrLf
    Next
    MsgBox sResult
End Sub

Sub ParseToDict(sPattern, sResponse, oList)
    Dim oMatch, arrSMatches
    Set oList = CreateObject("Scripting.Dictionary")
    With CreateObject("VBScript.RegExp")
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = sPattern
        For Each oMatch In .Execute(sResponse)
            oList(oMatch.SubMatches(0)) = oMatch.SubMatches(1)
        Next
    End With
End Sub

Sub ParseToNestedArr(sPattern, sResponse, arrMatches)
    Dim oMatch, arrSMatches, sSubMatch
    arrMatches = Array()
    With CreateObject("VBScript.RegExp")
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = sPattern
        For Each oMatch In .Execute(sResponse)
            arrSMatches = Array()
            For Each sSubMatch in oMatch.SubMatches
                PushItem arrSMatches, sSubMatch
            Next
            PushItem arrMatches, arrSMatches
        Next
    End With
End Sub

Sub PushItem(arrList, varItem)
    ReDim Preserve arrList(UBound(arrList) + 1)
    arrList(UBound(arrList)) = varItem
End Sub

Sub XmlHttpRequest(sMethod, sUrl, arrSetHeaders, sFormData, sRespHeaders, sRespText)
    Dim arrHeader
    With CreateObject("Msxml2.ServerXMLHTTP.3.0")
        .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
        .Open sMethod, sUrl, False
        If IsArray(arrSetHeaders) Then
            For Each arrHeader In arrSetHeaders
                .SetRequestHeader arrHeader(0), arrHeader(1)
            Next
        End If
        .Send sFormData
        sRespHeaders = .GetAllResponseHeaders
        sRespText = .ResponseText
    End With
End Sub

It uses late bindings since initial target language was VBScript, but it's not so hard to change them to early binding if you want. Second link http://apps.cnbc.com/company/quote/newindex.asp?symbol=SDRL you could find in the content of the webpage as iframe source.

Sign up to request clarification or add additional context in comments.

Comments

0

I just had a brief look at the site, and I think you can do this without a browser object.

The problem is that these sites generally use something like Ajax to dynamically update a smaller div without having to refresh the entire page. That new data generally still arrives in html (though possibly compressed) so it can still be parsed in an HTMLDocument, it is however coming from a call to a different URL.

For this site in particular you initially GET from quotes.cnbc.com, then quietly in the background your browser takes another from data.cnbc.com, and finally the table you want from apps.cnbc.com. You can still do all of these using an http request object if all are necessary, and may even be able to skip the first two if cookies aren't required, and post data isn't built by JS in the first two.

I suggest you download a network traffic monitor like Fiddler 4. It's free, and indispensable during projects like this.

It's a little confusing for the first time so here's a quick primer. After you've opened it, and made your first call to CNBC, locate it in the panel at left and highlight. Then in the upper right panel click the "inspectors" tab, then "raw". This will show you the header and post data your browser sent to CNBC, this is what you want to duplicate in your HTTP Request. In the lower right panel you can click on raw to see the response header and body, as well as status codes, HTML syntax, rendered html (without css) etc... You can use these to figure out which request returns the data you actually want, and look at exactly how it arrives.

I think you'll be surprised by exactly how close you are.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.