Parse docblocks of function calls, not function definitions?

614 views Asked by At

I'm trying to get docblocks preceding certain function calls in a PHP file. The difference to the usual approach is that I'm not trying to parse docblocks of function definitions.

Example file:

<?php
$data = get_data($id);

if ( empty( $data->random ) ) {
  /**
  * Brief description
  *
  * @since 1.0
  * @param int $var Variable
  */
  do_function( 'identifier', $var );
  exit;
}

// random comment
$status = get_random_function($post);
?>

do_function does appear on various places in various files I'm going to parse. What I'm trying to get and parse is the preceding docblock including the function call.

A Reflection class is not an option as the files don't include classes, so I'm stuck with the following RegExp which returns an empty array:

preg_match_all('/(\/\*.+\*\/)[\s]{0,}do_function/m', $filecontent_as_string, $results);

What am I doing wrong here? Thanks!

2

There are 2 answers

0
hwnd On

Check out Tokenizer or Reflection for this case. You may also see file in which you could use to match those certain lines of comments and have it return an array of lines.

If you desire a regular expression in this case, this should do what you want.

/(\/\*(?:[^*]|\n|(?:\*(?:[^\/]|\n)))*\*\/)\s+do_function/

See a demo in action here

Regular expression:

(                     group and capture to \1:
 \/                   match '/'
 \*                   match '*'
 (?:                  group, but do not capture (0 or more times)
   [^*]   |           any character except: '*' OR
   \n     |           any character of: '\n' (newline) OR
   (?:                group, but do not capture:
     \*               match '*'
     (?:              group, but do not capture:
       [^\/] |        any character except: '/' OR
       \n             any character of: '\n' (newline)
     )                end of grouping
   )                  end of grouping
  )*                  end of grouping
  \*                  match '*'
   \/                 match '/'
)                     end of \1
 \s+                  whitespace (\n, \r, \t, \f, and " ") (1 or more times)
 do_function          'do_function'
2
Jerry On

You can have a much simpler regex with the following:

#(?s)(/\*(?:(?!\*/).)+\*/)\s+do_function#

regex101 demo

(?s) can be set as flag (#(/\*(?:(?!\*/).)+\*/)\s+do_function#s) and makes the . match newlines.

/\* matches the beginning of the docblock.

(?:(?!\*/).)+ matches every character except */.

\*/ matches the end of the docblock.

\s+do_function matches spaces and newlines until the do_function is found.