# алгоритм – Algorithm to convert number range to regular expression

## Question:

I don’t know how correct the title itself looks, but I’ll describe it in more detail here.

So, the essence is this: there is, for example, a certain range and a couple of conditions:

• `\$start` and `\$end` are always the same character length (although this is not important);

• And if there is a range that includes the element `555123456` (length `9` characters), then there is no other range that would include a number that is longer than the already existing one and starts with `555` – i.e. number 555123456 7 (length – 10 characters) does not exist by definition.

• The last digits in `\$start` and `\$end` are `0` and `9` respectively, which also makes things easier.

Those. (hereinafter I write conditionally, without reference to any programming language or `regexp` standard)

``````555000000 - 555999999 // Т.е. 555*
666000000 - 666999999 // Т.е. 666*

5550000000 - 5559999999 // Не существует в списке диапазонов по определению задачи, игнорируем.
6660000000 - 6669999999 // Не существует в списке диапазонов по определению задачи, игнорируем.
``````

In other words: if 555000000 – 555999999 is given, then no 555000000 0 – 555999999 9

Take for example

``````555000000 - 555999999
``````

Task: convert to `555*` . This example is easy to understand – you need to subtract `\$start` from `\$end` to get some variable `\$result` , which is equal to `999999` . Then use the value of the result of the `\$result.length()` operation against, say, `\$start` to get the string `555*`

``````\$shortrec = substr(\$start,0,-strlen(\$result)) . "*";
``````

But if the range looks like

``````777120000 - 777259999
``````

then the difference will be `139999` . Here it is more complicated, but also understandable – in addition to the partly indicated above, we compare the positions of `0` and `9` in both variables, “go out” to `12` and additionally process `12` , increasing it `13` times and getting the following sequence in the loop:

``````77712*
77713*
77714*
...
77725*
``````

But what if there is a range

``````888125120 - 888959599
``````

(here we recall condition number `3` , which facilitates logic, so as not to operate with an expression like `[5-9]` if `\$start` would be equal to 88812512 5 ) for which it is necessary to obtain

``````88812512*
8882*
8883*
...
8888*

88890*
88891*
88892*
...
88894*

888950*
888951*
888952*
...
888958*

8889590*
8889591*
8889592*
...
8889595*
``````
• I ask you to judge – do I follow the correct logic from the very beginning or not?
• And is there any universal solution (algorithm) for such tasks?
• What would you do?

PS The programming language is not important – the approach itself and the algorithm are important. I hope I didn’t break the reader’s brain, but the task is not one of educational ones, but the most real one. So if you have any ideas I would be very grateful.

We will write the generation algorithm for a particular case – when the number of characters at the lower and upper boundaries are the same. If the number of characters is different, then the original range will be divided into 3 ranges:

1. from the minimum limit to the maximum number with the same number of digits
2. all complete ranges of numbers that were not affected by bounds
3. from the minimum number with the same number of digits as the maximum limit to the maximum limit

An example makes it clearer what all this means. The initial range `14-123456` is divided into three ranges `14-99, 100-99999, 100000-123456` .
If there is a second range, then it is specified by a simple expression `\d{n,m}` , the first and second ranges must be generated separately and within the boundaries of these ranges there is always the same number of characters, which means we apply the main generation algorithm, and then combine these 3 ranges with an alternative.

Basic algorithm for ranges with the same number of digits

1. Let's try to find a common part for the upper and lower bounds. For example, in the range `1234-1278` , `12` can be taken out as ordinary characters.
2. Let's start viewing the borders from left to right. For example, the range `1234-5678`
3. The first digits of the lower and upper limits are `1` and `5` , so we divide into 3 ranges: `1234-1999, 2000-4999, 5000-5678` , that is, as if we select the middle, which can be represented as `[2-4]\d{3}`
4. The remaining two ranges will be processed using the same algorithm. For example, for `1234-1999` , we first take out the number 1 as a general one. We will divide the range `234-999` into 2 ranges `234-299, 300-999`
5. And so on for all the ranges that will appear during the splitting into composite ranges.

There is a lot of code, but you can run and test it.

``````\$( document ).ready( function() {
\$( "#rangeLeft, #rangeRight" ).keydown( function() {
clearDisplay();
} );
\$( "#run" ).click( function() {
clearDisplay();
var rangeLeft = \$( "#rangeLeft" ).val();
var rangeRight = \$( "#rangeRight" ).val();
if ( ! checkRanges( rangeLeft,  rangeRight ) ) return;
\$( "#result" ).append( generateFullRegExp(rangeLeft, rangeRight)+"<BR/>" );
\$( "#test" ).show();
} );
\$( "#test" ).click( function() {
var rangeLeft = \$( "#rangeLeft" ).val();
var rangeRight = \$( "#rangeRight" ).val();
var re = new RegExp( "^"+generateFullRegExp(rangeLeft, rangeRight)+"\$" );
for( var i=Math.pow( 10, rangeLeft.length-1 ); i<Math.pow( 10, rangeRight.length); i++ ) {
if ( re.test( i+"" ) && ( i<parseInt(rangeLeft) || i>parseInt(rangeRight) ) ) \$( "#result" ).append( "Тест провален на: " + i+"<BR/>" );
if ( !re.test( i+"" ) && ( i>parseInt(rangeLeft) && i<parseInt(rangeRight) ) ) \$( "#result" ).append( "Тест провален на: " + i+"<BR/>" );
};
\$( "#result" ).append( "Тест пройден от "+Math.pow( 10, rangeLeft.length-1 )+" до "+i+"<BR/>" );
} );
} );

function checkRanges( rangeLeft, rangeRight ) {
if ( /\D/.test( rangeLeft ) || /\D/.test( rangeRight ) ) {
\$( "#result" ).append( "Введите два числа<BR/>" );
return false;
};
rangeLeft = parseInt( rangeLeft );
rangeRight = parseInt( rangeRight );
if ( isNaN( rangeLeft ) || isNaN( rangeRight ) ) \$( "#result" ).append( "Не указаны границы диапазонов<BR/>" );
if ( rangeLeft < 0 ) \$( "#result" ).append( "Левая граница меньше 0<BR/>" );
if ( rangeRight < 0 ) \$( "#result" ).append( "Правая граница меньше 0<BR/>" );
if ( rangeLeft > rangeRight ) \$( "#result" ).append( "Левая граница больше правой границы<BR/>" );
return( !(
rangeLeft < 0 ||
rangeRight < 0 ||
rangeLeft > rangeRight ||
isNaN( rangeLeft ) ||
isNaN( rangeRight )
) );
};

function maxBeginStr( str1, str2 ) {
var res = /^(.*)[^-]*\-\1/.exec( str1 + "-" + str2 );
return res ? res[1] : "";
};

function midDiap( start, end ){
var st0int = parseInt( start[0] );
var en0int = parseInt( end[0] );
if ( st0int+1 == en0int ) {
var res = end[0];
} else {
if ( st0int == en0int-2 ) {
res=( st0int+1 )+"";
} else {
res="["+( st0int+1 )+"-"+(en0int-1)+"]";
};
};
if ( start.length == 1 ) return res;
return res+"\\d{"+(start.length-1)+"}";
};

function lowDiap( num, pos ) {
var res = num.substr(0, pos);
var highRange = parseInt( num[pos] )-1;
if ( highRange == -1 && pos == num.length-1 ) return num;
if ( highRange == -1 ) return null; // выражение можно не включать
if ( pos == num.length-1 ) highRange++;
res += "[0-"+highRange+"]";
if ( num.length != pos+1 ){
res += "\\d{"+(num.length-pos-1)+"}";
}
return res;
};

function highDiap( num, pos ) {
var res = num.substr(0, pos);
var lowRange = parseInt( num[pos] )+1;
if ( lowRange==10 && pos == num.length-1 ) return num;
if ( lowRange==10 ) return null; // выражение можно не включать
if ( pos == num.length-1 ) lowRange--;
res += "["+lowRange+"-9]";
if ( num.length != pos+1 ){
res += "\\d{"+( num.length -pos-1)+"}";
}
return res;
};

function getRegExp( start, end ) {
if ( start.length != end.length ) return "Invalid input";
var res= maxBeginStr( start, end );
start= start.substr( res.length );
end  = end.substr( res.length );
if ( start.length  == 0 ) return res;
var st0int = parseInt( start[0] );
var en0int = parseInt( end[0] );
var resArr= Array();
if ( start.length > 1 ) {
if ( st0int != en0int-1 ) {
resArr.push( midDiap( start, end) );
}
for ( var i=1; i<end.length; i++ ) {
var miniRe = lowDiap( end, i );
if ( miniRe != null ) {
resArr.push( miniRe );
}
miniRe = highDiap( start, i );
if ( miniRe != null) {
resArr.push( miniRe );
}
}
} else {
resArr.push( "["+start+"-"+end+"]" );
};
return res+"(?:"+resArr.join("|")+")";
};

function generateFullRegExp( rangeLeft, rangeRight ) {
if ( rangeLeft.length == rangeRight.length ) {
return getRegExp( rangeLeft, rangeRight );
};
var resArr = Array();
resArr.push( getRegExp( rangeLeft, "9".repeat( rangeLeft.length ) ) );
if ( rangeRight.length - rangeLeft.length > 1 ) resArr.push( ("\\d{"+(rangeLeft.length+1)+","+(rangeRight.length-1)+"}").replace(/(\d+),\1/, "\$1") );
resArr.push( getRegExp( "1"+"0".repeat( rangeRight.length-1 ), rangeRight ) );
return "(?:"+resArr.join("|")+")";
};

function clearDisplay() {
\$( "#result" ).html( "" );
\$( "#test" ).hide();
};``````
``#test { display:none }``
``````<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<BODY>
<INPUT id="rangeLeft" value=1234 /> - <INPUT id="rangeRight" value=4567 />
<BR/>
<BUTTON id="run">Генерировать</BUTTON>
<PRE id="result" ></PRE>
<BUTTON id="test">Проверить</BUTTON>
</BODY>``````

Existing deficiency
For the range `10-29` etc. redundant expression will be generated `(?:2[0-9]|1[0-9])`

Scroll to Top