PHP Based HTML Validator

Question

I need to find a PHP based HTML (as in WC3-Like) Validator that can look for invalid HTML or XHTML. I've searched Google a little, but was curious if anyone has used one they particularly liked.

I have the HTML in a string:

$html = "<html><head>.....</body></html>";

And I would like to be able to test the page, and have it return the errors. (Not echo/print anything)

I've seen:
-http://www.bermi.org/xhtml_validator
-http://twineproject.sourceforge.net/doc/phphtml.html

The background for this is that I'd like to have a function/class that I run on every page, check if the file has been modified since the last access date (or something similar to that), and if it hasn't, run the validator so I am immediately notified of invalid HTML while coding.

Robert Elwell · Accepted Answer · 2009-08-28 21:29:58Z

6

There's no need to reinvent the wheel on this one. There's already a PEAR library that interfaces with the W3C HTML Validator API. They're willing to do the work for you, so why not let them? :)

answered Aug 28, 2009 at 21:29

Robert Elwell

6,6881 gold badge31 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Byron Whitlock Over a year ago

Pretty cool, but you have to rely on thier webservice. This means you must be connected to a public internet. very neat though.

Tyler Carter Over a year ago

This is an definitely an option.

Byron Whitlock · Accepted Answer · 2009-08-28 21:30:20Z

2

While it isn't strictly PHP, (it is a executable) one i really like is w3c's HTML tidy. it will show what is wrong with the HTML, and fix it if you want it to. It also beautifies HTML so it doesn't look like a mess. runs from the command line and is easy to integrate into php.

check it out. http://www.w3.org/People/Raggett/tidy/

answered Aug 28, 2009 at 21:30

Byron Whitlock

54.2k29 gold badges128 silver badges170 bronze badges

Comments

Pons · Accepted Answer · 2010-02-04 13:40:23Z

0

If you can't use Tidy (sometimes hosting service do not activate this php module), you can use this PHP class: http://www.barattalo.it/html-fixer/

answered Feb 4, 2010 at 13:40

Pons

1,7762 gold badges14 silver badges19 bronze badges

Comments

Nikos M. · Accepted Answer · 2022-01-21 05:28:37Z

I had a case where I needed to check partial html code for unmatched and malformed tags (mostly, eg </br>, a common error in my samples) and various heavy-duty validators were too much to use. So I ended up making my own custom validation routine in PHP, it is pasted below (you may need to use mb_substr instead of index-based character retrieval if you have text in different languages) (note it does not parse CDATA or script/style tags but can be extended easily):

function check_html( $html )
{
    $stack = array();
    $autoclosed = array('br', 'hr', 'input', 'embed', 'img', 'meta', 'link', 'param', 'source', 'track', 'area', 'base', 'col', 'wbr');
    $l = strlen($html); $i = 0;
    $incomment = false; $intag = false; $instring = false;
    $closetag = false; $tag = '';
    while($i<$l)
    {
        while($i<$l && preg_match('#\\s#', $c=$html[$i])) $i++;
        if ( $i >= $l ) break;
        if ( $incomment && ('-->' === substr($html, $i, 3)) )
        {
                // close comment
                $incomment = false;
                $i += 3;
                continue;
        }
        $c = $html[$i++];
        if ( '<' === $c )
        {
            if ( $incomment ) continue;
            if ( $intag )  return false;
            if ( '!--' === substr($html, $i, 3) )
            {
                // open comment
                $incomment = true;
                $i += 3;
                continue;
            }

            // open tag
            $intag = true;
            if ( '/' === $html[$i] )
            {
                $i++;
                $closetag = true;
            }
            else
            {
                $closetag = false;
            }
            $tag = '';
            while($i<$l && preg_match('#[a-z0-9\\-]#i', $c=$html[$i]) )
            {
                $tag .= $c;
                $i++;
            }
            if ( !strlen($tag) ) return false;
            $tag = strtolower($tag);
            if ( $i<$l && !preg_match('#[\\s/>]#', $html[$i]) ) return false;
            if ( $i<$l && $closetag && preg_match('#^\\s*/>#sim', substr($html, $i)) ) return false;
            if ( $closetag )
            {
                if ( in_array($tag, $autoclosed) || (array_pop($stack) !== $tag) )
                    return false;
            }
            else if ( !in_array($tag, $autoclosed) )
            {
                $stack[] = $tag;
            }
        }
        else if ( '>' ===$c )
        {
            if ( $incomment ) continue;
            
            // close tag
            if ( !$intag ) return false;
            $intag = false;
        }
    }
    return !$incomment && !$intag && empty($stack);
}

It is a very bad idea to write your own HTML parser, especially if your code will be used on untrusted inputs. HTML is very complex. The parsing rules for HTML5 are very complicated and handle many nuanced edge cases. For some common misconceptions about HTML that may trip up "roll your own" parsers, see: alanhogan.com/html-myths#close-tags
There are cases (like mine) where a very simple custom parser was all that was needed and could not find such simple one elsewhere. So this is offered for such cases, else I totally agree with you

Collectives™ on Stack Overflow

PHP Based HTML Validator

4 Answers 4

2 Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related