регулярные-выражения – Calculating the resource consumption of a regular expression

Question:

Regular expressions are quite resource intensive compared to regular string operations. Moreover, their complexity is hidden behind a compact syntax – by eye, it is almost impossible to judge which of several possible options will be faster. The problem is complicated by different dialects (POSIX, Perl) and implementations (NFA, DFA) of regular expressions.

Is there a way/tool ​​to calculate the number of operations a regular expression would need to find the answer in a given string? Primarily interested in PCRE regular expressions?

The only reliable means of evaluating the performance of multiple regular expressions is to actually measure the search time, and necessarily on different text samples, because different text samples will be processed in different times.

The number of steps that regex101 shows makes sense (my subjective opinion) only when building recursive regular expressions – in them it will allow you to more or less evaluate the effectiveness of the recursion.

Let's compare three regular expressions:

1. `[az]` – 33 steps https://regex101.com/r/vC5lG7/1
2. `^[^az]*[az]` – 4 steps https://regex101.com/r/vC5lG7/2
3. `^[^az]*?[az]` – 4 steps https://regex101.com/r/vC5lG7/3

Let's perform a simple measurement of the execution time of these expressions:

``````function test( \$re, \$text, \$iters ) {
\$t1 = microtime( true );
for ( \$i=0; \$i<\$iters; \$i++ ) {
\$res = preg_match( \$re, \$text );
};
\$t2 = microtime( true );
echo (\$t2-\$t1)."<br/>";
};

\$re1 = "/[a-z]/";
\$re2 = "/^[^a-z]*[a-z]/";
\$re3 = "/^[^a-z]*?[a-z]/";
\$text = "                               a";
\$iters = 50000000;

test( \$re1, \$text, \$iters );
test( \$re2, \$text, \$iters );
test( \$re3, \$text, \$iters );
``````

Result:

``````9.5385589599609
9.1712720394135
9.6358540058136
``````

The first regular expression is only 4% slower than the second, although the number of steps was 33 versus 4.
The first regular expression is 1% faster than the third, even though the number of steps was 33 versus 4.