2

I just started to use PHP Simple HTML DOM Parser.

Now I'm trying to extract all elements surrounded with a <b>-tag inclduing </b> from an exsiting HTML document. This works fine with

foreach($html->find('b') as $q)
    echo $q;

How can I achieve to show up only elements surrounded with the <b>,</b>-tags followed by a <span class="marked">?

Update: I've used firebug to get the css path for the elements. Now it looks like this:

foreach ($html->find('html body div#wrapper table.desc tbody tr td div span.marked') as $x)
    foreach ($x->find('html body div#wrapper table.desc tbody tr td table.split tbody tr td b') as $d)
        echo $d;

But it won't work... Any Ideas?

Update:

To clarify my question here a sample tr of the document with starting table and ending table tags.

<table width="100%" border="0" cellspacing="0" cellpadding="0" class="desc">
    <tr>
        <th width="25%" scope="col"><div align="center">1</div></th>
        <th width="50" scope="col"><div align="center">2</div></th>
        <th width="10%" scope="col"><div align="center">3</div></th>
        <th width="15%" scope="col"><div align="center">4</div></th>
    </tr>
    <tr>
        <td valign="top" bgcolor="#E9E9E9"><div style="text-align: center; font-weight: bold; margin-top: 2px"> 1 </div></td>
        <td>
            <table width="100%" border="0" cellspacing="0" cellpadding="0" class="split">  <tr>
                    <td>
                        <b> element to extract</b></td>
                </tr>
                <tr>
                    <td>
                        <table width="100%" border="0" cellspacing="0" cellpadding="0" class="split">  <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        1
                                    </div>
                                </td>
                                <td>
                                    abed
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        2
                                    </div>
                                </td>
                                <td>
                                    ddee
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        3
                                    </div>
                                </td>
                                <td>
                                    xdef
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        4
                                    </div>
                                </td>
                                <td>
                                    abbcc
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        5
                                    </div>
                                </td>
                                <td>
                                    ab
                                </td>
                            </tr>
                            <tr>
                                <td width="15px" valign="top">&nbsp;</td>
                                <td width="15px" valign="top">  
                                    <div style="background-color:green ;color:#FFFFFF; text-align:center;padding-bottom: 1px">
                                        6
                                    </div>
                                </td>
                                <td>
                                    e1
                                </td>
                            </tr>
                        </table>
                    </td>
                </tr>
            </table>
        </td>
        <td valign="top"><div style="text-align: center"> <span class="marked">marked</span> </div></td>
        <td valign="top"><div style="text-align: center">  </div></td>
    </tr>
</table>
1
  • Do you mean <b><elements/></b><span class="marked"> or <b><elements/><span class="marked"></b>? Commented Jan 27, 2011 at 8:12

2 Answers 2

3

Try the following CSS selector

b > span.marked

That would return the span though, so you probably have to do $e->parent() to get to the b element.

Also see Best Methods to parse HTML for alternatives to SimpleHtmlDom


Edit after update:

Your browser will modify the DOM. If you look at your markup, you will see that there is no tbody elements. Yet Firebug gives you

html body div#wrapper table.desc tbody tr td div span.marked'
html body div#wrapper table.desc tbody tr td table.split tbody tr td b'

Also, your question does not match the queries. You asked how to find

elements surrounded with the <b>,</b>-tags followed by a <span class="marked">

That can be read to either mean

<b><span class="marked">foo</span></b>

or

<b><element>foo</element></b><span class="marked">foo</span>

For that first use the child combinator I have shown earlier. For the second, use the adjacent sibling combinator

b + span.marked

to get the span and then use $e->prev_sibling() to return the previous sibling of element (or null if not found).

However, in your shown markup, there is neither nor. There is only a DIV with a SPAN child having the marked class

<div style="text-align: center"> <span class="marked">marked</span>

If that is what you want to match, it's the child combinator again. Of course, you have to change the b then to a div.

Sign up to request clarification or add additional context in comments.

2 Comments

@Eray I dont know up to what level SimpleHtmlDom implements them though. And tbh, I dont see why I would need them (or SimpleHtmlDom) when I can use DOM and XPath :)
FTR, simplehtmldom doesn't support sibling selectors, but some alternatives do.
-1

More simple is from manual:

foreach($html->find('b') as $q)
    echo $q->plaintext;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.