How to parse a phpDoc style comment block with PHP?

Question

Please consider the following code with which I'm trying to parse only the first phpDoc style comment (not using any other libraries) in a file (file contents put in $data variable for testing purposes):

$data = "
/**
 * @file    A lot of info about this file
 *          Could even continue on the next line
 * @author  [email protected]
 * @version 2010-05-01
 * @todo    do stuff...
 */

/**
 * Comment bij functie bar()
 * @param Array met dingen
 */
function bar($baz) {
  echo $baz;
}
";

$data =  trim(preg_replace('/\r?\n *\* */', ' ', $data));
preg_match_all('/@([a-z]+)\s+(.*?)\s*(?=$|@[a-z]+\s)/s', $data, $matches);
$info = array_combine($matches[1], $matches[2]);
print_r($info)

This almost works, except for the fact that everything after @todo (including the bar() comment block and code) is considered the value of @todo:

Array (
    [file] => A lot of info about this file Could even continue on the next line
    [author] => [email protected]
    [version] => 2010-05-01
    [todo] => do stuff... /

    /** Comment bij functie bar()
    [param] => Array met dingen /
    function bar() {
      echo ;
    }
)

How does my code need to be altered so that only the first comment block is being parsed (in other words: parsing should stop after the first "*/" encountered?

Consider the case that s string like $s = '/** not a phpDoc @file ... */'; is placed before the first phpDoc. In other words: using regex, you will ever get a 100% reliable solution. — Bart Kiers
– Bart Kiers, Commented May 1, 2010 at 9:01

Dmitriy.Net · Accepted Answer · 2014-07-28 06:46:34Z

6

Writing a parser using PCRE will lead you to troubles. I would suggest to rely on the tokenizer or reflection first. Then it is safer to actually implement a parser for the doc block, which can handle all situations supported by the phpdoc format (what all libs ended to do as well).

edited Jul 28, 2014 at 6:46

Dmitriy.Net

1,52013 silver badges26 bronze badges

answered May 1, 2010 at 9:08

Pierre

7274 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pr0no Over a year ago

Thanks for a quick reply. In reality, I have to loop through many files, collecting the FIRST commentblock of every file (only the one describing the file; I don't need to collect the other commentblocks describing functions, methods, etc. The downside of using tokenizer is that I can't tell token_get_all() to stop looking for commentsblocks after the first one has been found. This results in a huge array which takes about 20-30 seconds to compile, which is too long since I have to recompile on every page request (don't ask...).

Pr0no Over a year ago

The advantage of regex is that one could instruct it to stop looking after the first commentblock of a file has been found, resulting in better performance. Or is there a workaround (see my code below using tokenizer)? foreach ($files as $file) { $data = file("$file.inc.php")); $tokens = token_get_all($data); foreach ($tokens as $token) { list($id, $text) = $token; switch ($id) { case T_DOC_COMMENT: $return[] = $token; break; default: break; } } print_r($return);

Nadir Latif · Accepted Answer · 2020-09-15 05:00:15Z

0

The Php Comment Manager script allows parsing DocBloc comments of methods. It supports parsing method description, @param and @return tags. It can be extended to support custom DocBloc tags

edited Sep 15, 2020 at 5:00

answered Mar 27, 2019 at 7:29

Nadir Latif

3,7671 gold badge17 silver badges24 bronze badges

1 Comment

k00ni Over a year ago

Your link isn't working anymore. Can you update it please?

Collectives™ on Stack Overflow

How to parse a phpDoc style comment block with PHP?

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related