Question:
Regular expressions are quite resource intensive compared to regular string operations. Moreover, their complexity is hidden behind a compact syntax – by eye, it is almost impossible to judge which of several possible options will be faster. The problem is complicated by different dialects (POSIX, Perl) and implementations (NFA, DFA) of regular expressions.
Is there a way/tool to calculate the number of operations a regular expression would need to find the answer in a given string? Primarily interested in PCRE regular expressions?
Answer:
The only reliable means of evaluating the performance of multiple regular expressions is to actually measure the search time, and necessarily on different text samples, because different text samples will be processed in different times.
The number of steps that regex101 shows makes sense (my subjective opinion) only when building recursive regular expressions – in them it will allow you to more or less evaluate the effectiveness of the recursion.
Let's compare three regular expressions:
-
[az]
– 33 steps https://regex101.com/r/vC5lG7/1 -
^[^az]*[az]
– 4 steps https://regex101.com/r/vC5lG7/2 -
^[^az]*?[az]
– 4 steps https://regex101.com/r/vC5lG7/3
Let's perform a simple measurement of the execution time of these expressions:
function test( $re, $text, $iters ) {
$t1 = microtime( true );
for ( $i=0; $i<$iters; $i++ ) {
$res = preg_match( $re, $text );
};
$t2 = microtime( true );
echo ($t2-$t1)."<br/>";
};
$re1 = "/[a-z]/";
$re2 = "/^[^a-z]*[a-z]/";
$re3 = "/^[^a-z]*?[a-z]/";
$text = " a";
$iters = 50000000;
test( $re1, $text, $iters );
test( $re2, $text, $iters );
test( $re3, $text, $iters );
Result:
9.5385589599609
9.1712720394135
9.6358540058136
The first regular expression is only 4% slower than the second, although the number of steps was 33 versus 4.
The first regular expression is 1% faster than the third, even though the number of steps was 33 versus 4.
Draw your own conclusions.
PS
necessarily on different text samples, because different text samples will be processed in different times
I will briefly explain.
Example
Given a large piece of text where the match will be at the end of the text. In such a text, greedy quantification will find it faster than minimal quantification, but if the match is at the beginning of a large text, then minimal quantification will find it faster.