5

Please consider the following code with which I'm trying to parse only the first phpDoc style comment (not using any other libraries) in a file (file contents put in $data variable for testing purposes):

$data = "
/**
 * @file    A lot of info about this file
 *          Could even continue on the next line
 * @author  [email protected]
 * @version 2010-05-01
 * @todo    do stuff...
 */

/**
 * Comment bij functie bar()
 * @param Array met dingen
 */
function bar($baz) {
  echo $baz;
}
";

$data =  trim(preg_replace('/\r?\n *\* */', ' ', $data));
preg_match_all('/@([a-z]+)\s+(.*?)\s*(?=$|@[a-z]+\s)/s', $data, $matches);
$info = array_combine($matches[1], $matches[2]);
print_r($info)

This almost works, except for the fact that everything after @todo (including the bar() comment block and code) is considered the value of @todo:

Array (
    [file] => A lot of info about this file Could even continue on the next line
    [author] => [email protected]
    [version] => 2010-05-01
    [todo] => do stuff... /

    /** Comment bij functie bar()
    [param] => Array met dingen /
    function bar() {
      echo ;
    }
)

How does my code need to be altered so that only the first comment block is being parsed (in other words: parsing should stop after the first "*/" encountered?

1
  • 1
    Consider the case that s string like $s = '/** not a phpDoc @file ... */'; is placed before the first phpDoc. In other words: using regex, you will ever get a 100% reliable solution. Commented May 1, 2010 at 9:01

2 Answers 2

6

Writing a parser using PCRE will lead you to troubles. I would suggest to rely on the tokenizer or reflection first. Then it is safer to actually implement a parser for the doc block, which can handle all situations supported by the phpdoc format (what all libs ended to do as well).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for a quick reply. In reality, I have to loop through many files, collecting the FIRST commentblock of every file (only the one describing the file; I don't need to collect the other commentblocks describing functions, methods, etc. The downside of using tokenizer is that I can't tell token_get_all() to stop looking for commentsblocks after the first one has been found. This results in a huge array which takes about 20-30 seconds to compile, which is too long since I have to recompile on every page request (don't ask...).
The advantage of regex is that one could instruct it to stop looking after the first commentblock of a file has been found, resulting in better performance. Or is there a workaround (see my code below using tokenizer)? foreach ($files as $file) { $data = file("$file.inc.php")); $tokens = token_get_all($data); foreach ($tokens as $token) { list($id, $text) = $token; switch ($id) { case T_DOC_COMMENT: $return[] = $token; break; default: break; } } print_r($return);
0

The Php Comment Manager script allows parsing DocBloc comments of methods. It supports parsing method description, @param and @return tags. It can be extended to support custom DocBloc tags

1 Comment

Your link isn't working anymore. Can you update it please?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.