I'm not a developer and have very limited XML knowledge but for what I've learned the past 3-4 days researching on the web. So apologies in advance for the basic level of this question. I'm trying to wrap up this one time task.
I have some VBA Excel knowledge and currently I'm trying to use VBA to extract the SIC code attribute from a given company's page on the SEC filing website. As an example, this is the site for Walmart
In the blue bar at the top you can see 'SIC: 5331' it's the 5331 I'm trying to return to a VBA variable so I can populate a spreadsheet. When I right click in IE and clich View Source the part of the page that is relevant reads in XML as:
<div id="contentDiv">
<!-- START FILER DIV -->
<div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
<div class="mailer">Mailing Address
<span class="mailerAddress">702 SOUTHWEST 8TH STREET</span>
<span class="mailerAddress"> BENTONVILLE AR 72716 </span>
</div>
<div class="mailer">Business Address
<span class="mailerAddress">702 SOUTHWEST 8TH ST</span>
<span class="mailerAddress">BENTONVILLE AR 72716 </span>
<span class="mailerAddress">5012734000</span>
</div>
<div class="companyInfo">
<span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&CIK=0000104169&owner=exclude&count=40">0000104169 (see all company filings)</a></span>
<p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&SIC=5331&owner=exclude&count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&State=AR&owner=exclude&count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>.
<br />Get <a href="/cgi-bin/own-disp?action=getowner&CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>.
</p>
</div>
</div>
</div>
In trying to understand how VBA might be used to extract the SIC, I found the following post on your site:
Query and parse xml attribute value into XLS using VBA
I tried to apply barrowc's answer by copy/paste into an Excel module and inserted the path to the Wal Mart filings however when I step through I get the Debug.Print "*****" but I'm not getting anything for n.Text.
Sub test4()
Dim d As MSXML2.DOMDocument60
Dim i As IXMLDOMNodeList
Dim n As IXMLDOMNode
Set d = New MSXML2.DOMDocument60
d.async = False
d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany")
Debug.Print "*****"
Set i = d.SelectNodes("//div[@id='contentDiv']")
For Each n In i
Debug.Print n.Text
Next n
Debug.Print "*****"
Set d = Nothing
End Sub
I've tried various strings in d.SelectNodes(), but I don't know enough about this topic to understand where I'm going wrong. So either a comment on my syntax or a pointer to a resource would be enormously helpful.
acronymnode and then select its second proceeding sibling. Although you may have trouble because that is HTML and does not appear to be valid xml. You may have better luck parsing it as a string instead of XML. Especially if the structure won't change.