0

First off, I'm certain it's something obvious that I should've caught hours ago, but I just can't seem see it.

Situation

So, the situation is that I'm trying to setup a reusable (not recursive) function to parse a block of HTML into a multi-dimensional array split by the header elements. Basically, the end result should be no more than 7 levels (H1-6 and the children of the H6). There's also a catch for any elements before the first H1 to be placed into a "special" section labeled "Top".

Code

<?php
    function sortEntrySections($section, $level = 1) {
        if(is_array($section)) {
            $i = 0;
            $ele = 'h' . $level;
            $sectionStructure = $level === 1 ? array(array('title' => 'Top', 'children' => array())) : array();
            foreach($section as $element) {
                if($element->tagName != $ele && isset($sectionStructure[$i]) && is_array($sectionStructure[$i])) {
                    array_push($sectionStructure[$i]['children'], $element);
                } else {
                    $i++;
                    if($element->tagName == $ele) {
                        $sectionStructure[$i] = array('title' => $element->textContent, 'children' => array($element));
                    } else {
                        $sectionStructure[$i] = $element;
                    }
                }
            }
            return $sectionStructure;
        }
        return $section;
    }

    function breakupEntry() {
        $body = new DOMDocument();
        @$body->loadHTML(mb_convert_encoding(html_entity_decode($GLOBALS['libraryEntry']['body']), 'HTML-ENTITIES', 'UTF-8'));
        $formattedBody = new DOMDocument();

        /* Build Multidimensional Array of Sections */
        $i = 0;
        $elements = array();
        foreach($body->getElementsByTagName('*') as $child) {
            if($child->tagName !== 'html' && $child->tagName !== 'body' && $child->parentNode->tagName === 'body') {
                array_push($elements, $formattedBody->importNode($child, true));
            }
        }
        $sections = sortEntrySections($elements, 1);
        for($i = 1; $i < sizeof($sections); $i++) {
            $childrenH1 = sortEntrySections($sections[$i]['children'], 2);
            if(isset($childrenH1['children'])) {
                foreach($childrenH1['children'] as $j => $childH1) {
                    $childrenH2 = sortEntrySections($childH1, 3);
                    if(isset($childrenH2['children'])) {
                        foreach($childrenH2['children'] as $k => $childH2) {
                            $childrenH3 = sortEntrySections($childH2, 4);
                            if(isset($childrenH3['children'])) {
                                foreach($childrenH3['children'] as $l => $childH3) {
                                    $childrenH4 = sortEntrySections($childH3, 5);
                                    if(isset($childrenH4['children'])) {
                                        foreach($childrenH4['children'] as $m => $childH4) {
                                            $childrenH4[$m]['children'] = sortEntrySections($childH4, 6);
                                        }
                                    }
                                    $childrenH3['children'][$l] = $childrenH4;
                                }
                            }
                            $childrenH2['children'][$k] = $childrenH3;
                        }
                    }
                    $childrenH1['children'][$j] = $childrenH2;
                }
            }
            $sections[$i]['children'] = $childrenH1;
        }
        return $sections;
    }

    $body = <<<EOD
<p>Pre Header Section Content 1</p>
<p>Pre Header Section Content 2</p>
<p>Pre Header Section Content 3</p>
<h1>Header 1</h1>
<p>Header 1 Section Content 1</p>
<p>Header 1 Section Content 2</p>
<p>Header 1 Section Content 3</p>
<h2>Header 1.1</h2>
<p>Header 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Section Content 3</p>
<h3>Header 1.1.1</h3>
<p>Header 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Section Content 3</p>
<h4>Header 1.1.1.1</h4>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Section Content 3</p>
<h5>Header 1.1.1.1.1</h5>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 3</p>
<h6>Header 1.1.1.1.1.1</h6>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 3</p>
<h6>Header 1.1.1.1.1.2</h6>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 3</p>
<h6>Header 1.1.1.1.1.3</h6>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 3</p>
<h5>Header 1.1.1.1.2</h5>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 3</p>
<h5>Header 1.1.1.1.3</h5>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 3</p>
<h4>Header 1.1.1.2</h4>
<p>Header 1 Subheader 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 2 Section Content 3</p>
<h4>Header 1.1.1.3</h4>
<p>Header 1 Subheader 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 3 Section Content 3</p>
<h3>Header 1.1.2</h3>
<p>Header 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 2 Section Content 3</p>
<h3>Header 1.1.3</h3>
<p>Header 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 3 Section Content 3</p>
<h2>Header 1.2</h2>
<p>Header 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 2 Section Content 3</p>
<h2>Header 1.3</h2>
<p>Header 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 3 Section Content 3</p>
<h1>Header 2</h1>
<p>Header 2 Section Content 1</p>
<p>Header 2 Section Content 2</p>
<p>Header 2 Section Content 3</p>
<h1>Header 3</h1>
<p>Header 3 Section Content 1</p>
<p>Header 3 Section Content 2</p>
<p>Header 3 Section Content 3</p>
EOD;
    $libraryEntry = array('body' => $body);

    $results = breakupEntry();

    echo '<textarea>'; var_dump($results); echo '</textarea>';
?>

Results

https://pastebin.com/JLftvXdB

Expected

https://pastebin.com/tzqxu8q4

1 Answer 1

1

I rewrote this thing half a dozen times, each giving a different problem I kept getting stuck on. In the end, I rewrote it as a limited recursive function, using restrictions on the $level variable to ensure it didn't exceed the intended range.

<?php
    function sortEntrySections($section, $level = 1) {
        if(is_array($section)) {
            $i = 0;
            $level = intval($level);
            $level = $level > 6 ? 6 : ($level < 1 ? 1 : $level);
            $ele = 'h' . $level;
            $sectionStructure = $level === 1 ? array(array('title' => 'Top', 'children' => array())) : array();
            foreach($section as $element) {
                if($element->tagName != $ele && isset($sectionStructure[$i]) && is_array($sectionStructure[$i])) {
                    array_push($sectionStructure[$i]['children'], $element);
                } else {
                    $i++;
                    if($element->tagName == $ele) {
                        $sectionStructure[$i] = array('title' => $element->textContent, 'children' => array($element));
                    } else {
                        $sectionStructure[$i] = $element;
                    }
                }
            }
            foreach($sectionStructure as $i => $subsection) {
                if(is_array($subsection) && isset($subsection['children']) && $level < 6) {
                    $sectionStructure[$i]['children'] = sortEntrySections($subsection['children'], $level + 1);
                }
            }
            return $sectionStructure;
        }
        return $section;
    }

    function breakupEntry() {
        $body = new DOMDocument();
        @$body->loadHTML(mb_convert_encoding(html_entity_decode($GLOBALS['libraryEntry']['body']), 'HTML-ENTITIES', 'UTF-8'));
        $formattedBody = new DOMDocument();

        /* Build Multidimensional Array of Sections */
        $i = 0;
        $elements = array();
        foreach($body->getElementsByTagName('*') as $child) {
            if($child->tagName !== 'html' && $child->tagName !== 'body' && $child->parentNode->tagName === 'body') {
                array_push($elements, $formattedBody->importNode($child, true));
            }
        }
        $sections = sortEntrySections($elements);
        return $sections;
    }

    $body = <<<EOD
<p>Pre Header Section Content 1</p>
<p>Pre Header Section Content 2</p>
<p>Pre Header Section Content 3</p>
<h1>Header 1</h1>
<p>Header 1 Section Content 1</p>
<p>Header 1 Section Content 2</p>
<p>Header 1 Section Content 3</p>
<h2>Header 1.1</h2>
<p>Header 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Section Content 3</p>
<h3>Header 1.1.1</h3>
<p>Header 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Section Content 3</p>
<h4>Header 1.1.1.1</h4>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Section Content 3</p>
<h5>Header 1.1.1.1.1</h5>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 3</p>
<h6>Header 1.1.1.1.1.1</h6>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Section Content 3</p>
<h6>Header 1.1.1.1.1.2</h6>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 3</p>
<h6>Header 1.1.1.1.1.3</h6>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 3</p>
<h5>Header 1.1.1.1.2</h5>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 2 Section Content 3</p>
<h5>Header 1.1.1.1.3</h5>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 1 Subheader 3 Section Content 3</p>
<h4>Header 1.1.1.2</h4>
<p>Header 1 Subheader 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 2 Section Content 3</p>
<h4>Header 1.1.1.3</h4>
<p>Header 1 Subheader 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 1 Subheader 3 Section Content 3</p>
<h3>Header 1.1.2</h3>
<p>Header 1 Subheader 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 2 Section Content 3</p>
<h3>Header 1.1.3</h3>
<p>Header 1 Subheader 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 1 Subheader 3 Section Content 3</p>
<h2>Header 1.2</h2>
<p>Header 1 Subheader 2 Section Content 1</p>
<p>Header 1 Subheader 2 Section Content 2</p>
<p>Header 1 Subheader 2 Section Content 3</p>
<h2>Header 1.3</h2>
<p>Header 1 Subheader 3 Section Content 1</p>
<p>Header 1 Subheader 3 Section Content 2</p>
<p>Header 1 Subheader 3 Section Content 3</p>
<h1>Header 2</h1>
<p>Header 2 Section Content 1</p>
<p>Header 2 Section Content 2</p>
<p>Header 2 Section Content 3</p>
<h1>Header 3</h1>
<p>Header 3 Section Content 1</p>
<p>Header 3 Section Content 2</p>
<p>Header 3 Section Content 3</p>
EOD;
    $libraryEntry = array('body' => $body);

    $results = breakupEntry();

    echo '<textarea>'; var_dump($results); echo '</textarea>';
?>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.