0

I am trying to extract first name of the actor from the url i am passing, for my URL,i need to extact "Will Smith" from the HTML page. Web Page

I know how to extract elements from HTML page using tag,classnaem etc.

But the problem i am facing is when i pass the URL, "https://ssl.ofdb.de/film/138627,I-Am-Legend" in the response text,i am not at receiving the full HTML page,due to this i am not able extract the content "Will Smith".

I tried other methods like MSXML2.XMLHTTP60 also both returns the partial HTML page only

I have attached my code here,any one please help

Sub Fetch_Info()

Dim ie As New InternetExplorer
Set ie = CreateObject("InternetExplorer.Application")


ie.Visible = True
ie.Top = 0
ie.Left = 700
ie.Width = 1000
ie.Height = 750
ie.AddressBar = 0
ie.StatusBar = 0
ie.Toolbar = 0
ie.navigate "https://ssl.ofdb.de/film/138627,I-Am-Legend"

Do
DoEvents
Loop Until ie.readyState = READYSTATE_COMPLETE

Application.Wait Now + TimeValue("00:00:04")

Dim doc As HTMLDocument
Set doc = ie.document
doc.Focus


Debug.Print doc.DocumentElement.innerHTML

End Sub

1
  • What is it you wanna scrape from that webpage? Commented Apr 19, 2020 at 16:44

1 Answer 1

1

You can use the following css selector. querySelector returns the first node matched for the css pattern. The pattern is [itemprop='actor'] span which looks for a child span with parent element having attribute itemprop with value actor. Note, I am working off ie.document node.

Debug.Print ie.document.querySelector("[itemprop=actor] span").innerText

That content is static so you could use xhr and avoid overhead of browser. The response header charset is none so you need the response body.

Option Explicit

Public Sub GetActor()
    Dim xhr As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
    'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ;  Microsoft XML, v6 (your version may vary)

    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument

    With xhr
        .Open "GET", "https://ssl.ofdb.de/film/138627,I-Am-Legend", False
        .send
         html.body.innerHTML = StrConv(.responseBody, vbUnicode)
    End With

    ActiveSheet.Cells(1, 1) = html.querySelector("[itemprop=actor] span").innerText
End Sub
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a Ton,that really worked,also what modification do i need incase if i want the second actor name from that page? (Alice Braga). Also when i debug print the entire HTML page,why i am not able to find the name "Will Smith"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.