Question:
I don’t know how correct the title itself looks, but I’ll describe it in more detail here.
So, the essence is this: there is, for example, a certain range and a couple of conditions:
-
$start
and$end
are always the same character length (although this is not important); -
And if there is a range that includes the element
555123456
(length9
characters), then there is no other range that would include a number that is longer than the already existing one and starts with555
– i.e. number 555123456 7 (length – 10 characters) does not exist by definition. -
The last digits in
$start
and$end
are0
and9
respectively, which also makes things easier.
Those. (hereinafter I write conditionally, without reference to any programming language or regexp
standard)
555000000 - 555999999 // Т.е. 555*
666000000 - 666999999 // Т.е. 666*
5550000000 - 5559999999 // Не существует в списке диапазонов по определению задачи, игнорируем.
6660000000 - 6669999999 // Не существует в списке диапазонов по определению задачи, игнорируем.
In other words: if 555000000 – 555999999 is given, then no 555000000 0 – 555999999 9
Take for example
555000000 - 555999999
Task: convert to 555*
. This example is easy to understand – you need to subtract $start
from $end
to get some variable $result
, which is equal to 999999
. Then use the value of the result of the $result.length()
operation against, say, $start
to get the string 555*
$shortrec = substr($start,0,-strlen($result)) . "*";
But if the range looks like
777120000 - 777259999
then the difference will be 139999
. Here it is more complicated, but also understandable – in addition to the partly indicated above, we compare the positions of 0
and 9
in both variables, “go out” to 12
and additionally process 12
, increasing it 13
times and getting the following sequence in the loop:
77712*
77713*
77714*
...
77725*
But what if there is a range
888125120 - 888959599
(here we recall condition number 3
, which facilitates logic, so as not to operate with an expression like [5-9]
if $start
would be equal to 88812512 5 ) for which it is necessary to obtain
88812512*
8882*
8883*
...
8888*
88890*
88891*
88892*
...
88894*
888950*
888951*
888952*
...
888958*
8889590*
8889591*
8889592*
...
8889595*
- I ask you to judge – do I follow the correct logic from the very beginning or not?
- And is there any universal solution (algorithm) for such tasks?
- What would you do?
PS The programming language is not important – the approach itself and the algorithm are important. I hope I didn’t break the reader’s brain, but the task is not one of educational ones, but the most real one. So if you have any ideas I would be very grateful.
Answer:
We will write the generation algorithm for a particular case – when the number of characters at the lower and upper boundaries are the same. If the number of characters is different, then the original range will be divided into 3 ranges:
- from the minimum limit to the maximum number with the same number of digits
- all complete ranges of numbers that were not affected by bounds
- from the minimum number with the same number of digits as the maximum limit to the maximum limit
An example makes it clearer what all this means. The initial range 14-123456
is divided into three ranges 14-99, 100-99999, 100000-123456
.
If there is a second range, then it is specified by a simple expression \d{n,m}
, the first and second ranges must be generated separately and within the boundaries of these ranges there is always the same number of characters, which means we apply the main generation algorithm, and then combine these 3 ranges with an alternative.
Basic algorithm for ranges with the same number of digits
- Let's try to find a common part for the upper and lower bounds. For example, in the range
1234-1278
,12
can be taken out as ordinary characters. - Let's start viewing the borders from left to right. For example, the range
1234-5678
- The first digits of the lower and upper limits are
1
and5
, so we divide into 3 ranges:1234-1999, 2000-4999, 5000-5678
, that is, as if we select the middle, which can be represented as[2-4]\d{3}
- The remaining two ranges will be processed using the same algorithm. For example, for
1234-1999
, we first take out the number 1 as a general one. We will divide the range234-999
into 2 ranges234-299, 300-999
- And so on for all the ranges that will appear during the splitting into composite ranges.
There is a lot of code, but you can run and test it.
$( document ).ready( function() {
$( "#rangeLeft, #rangeRight" ).keydown( function() {
clearDisplay();
} );
$( "#run" ).click( function() {
clearDisplay();
var rangeLeft = $( "#rangeLeft" ).val();
var rangeRight = $( "#rangeRight" ).val();
if ( ! checkRanges( rangeLeft, rangeRight ) ) return;
$( "#result" ).append( generateFullRegExp(rangeLeft, rangeRight)+"<BR/>" );
$( "#test" ).show();
} );
$( "#test" ).click( function() {
var rangeLeft = $( "#rangeLeft" ).val();
var rangeRight = $( "#rangeRight" ).val();
var re = new RegExp( "^"+generateFullRegExp(rangeLeft, rangeRight)+"$" );
for( var i=Math.pow( 10, rangeLeft.length-1 ); i<Math.pow( 10, rangeRight.length); i++ ) {
if ( re.test( i+"" ) && ( i<parseInt(rangeLeft) || i>parseInt(rangeRight) ) ) $( "#result" ).append( "Тест провален на: " + i+"<BR/>" );
if ( !re.test( i+"" ) && ( i>parseInt(rangeLeft) && i<parseInt(rangeRight) ) ) $( "#result" ).append( "Тест провален на: " + i+"<BR/>" );
};
$( "#result" ).append( "Тест пройден от "+Math.pow( 10, rangeLeft.length-1 )+" до "+i+"<BR/>" );
} );
} );
function checkRanges( rangeLeft, rangeRight ) {
if ( /\D/.test( rangeLeft ) || /\D/.test( rangeRight ) ) {
$( "#result" ).append( "Введите два числа<BR/>" );
return false;
};
rangeLeft = parseInt( rangeLeft );
rangeRight = parseInt( rangeRight );
if ( isNaN( rangeLeft ) || isNaN( rangeRight ) ) $( "#result" ).append( "Не указаны границы диапазонов<BR/>" );
if ( rangeLeft < 0 ) $( "#result" ).append( "Левая граница меньше 0<BR/>" );
if ( rangeRight < 0 ) $( "#result" ).append( "Правая граница меньше 0<BR/>" );
if ( rangeLeft > rangeRight ) $( "#result" ).append( "Левая граница больше правой границы<BR/>" );
return( !(
rangeLeft < 0 ||
rangeRight < 0 ||
rangeLeft > rangeRight ||
isNaN( rangeLeft ) ||
isNaN( rangeRight )
) );
};
function maxBeginStr( str1, str2 ) {
var res = /^(.*)[^-]*\-\1/.exec( str1 + "-" + str2 );
return res ? res[1] : "";
};
function midDiap( start, end ){
var st0int = parseInt( start[0] );
var en0int = parseInt( end[0] );
if ( st0int+1 == en0int ) {
var res = end[0];
} else {
if ( st0int == en0int-2 ) {
res=( st0int+1 )+"";
} else {
res="["+( st0int+1 )+"-"+(en0int-1)+"]";
};
};
if ( start.length == 1 ) return res;
return res+"\\d{"+(start.length-1)+"}";
};
function lowDiap( num, pos ) {
var res = num.substr(0, pos);
var highRange = parseInt( num[pos] )-1;
if ( highRange == -1 && pos == num.length-1 ) return num;
if ( highRange == -1 ) return null; // выражение можно не включать
if ( pos == num.length-1 ) highRange++;
res += "[0-"+highRange+"]";
if ( num.length != pos+1 ){
res += "\\d{"+(num.length-pos-1)+"}";
}
return res;
};
function highDiap( num, pos ) {
var res = num.substr(0, pos);
var lowRange = parseInt( num[pos] )+1;
if ( lowRange==10 && pos == num.length-1 ) return num;
if ( lowRange==10 ) return null; // выражение можно не включать
if ( pos == num.length-1 ) lowRange--;
res += "["+lowRange+"-9]";
if ( num.length != pos+1 ){
res += "\\d{"+( num.length -pos-1)+"}";
}
return res;
};
function getRegExp( start, end ) {
if ( start.length != end.length ) return "Invalid input";
var res= maxBeginStr( start, end );
start= start.substr( res.length );
end = end.substr( res.length );
if ( start.length == 0 ) return res;
var st0int = parseInt( start[0] );
var en0int = parseInt( end[0] );
var resArr= Array();
if ( start.length > 1 ) {
if ( st0int != en0int-1 ) {
resArr.push( midDiap( start, end) );
}
for ( var i=1; i<end.length; i++ ) {
var miniRe = lowDiap( end, i );
if ( miniRe != null ) {
resArr.push( miniRe );
}
miniRe = highDiap( start, i );
if ( miniRe != null) {
resArr.push( miniRe );
}
}
} else {
resArr.push( "["+start+"-"+end+"]" );
};
return res+"(?:"+resArr.join("|")+")";
};
function generateFullRegExp( rangeLeft, rangeRight ) {
if ( rangeLeft.length == rangeRight.length ) {
return getRegExp( rangeLeft, rangeRight );
};
var resArr = Array();
resArr.push( getRegExp( rangeLeft, "9".repeat( rangeLeft.length ) ) );
if ( rangeRight.length - rangeLeft.length > 1 ) resArr.push( ("\\d{"+(rangeLeft.length+1)+","+(rangeRight.length-1)+"}").replace(/(\d+),\1/, "$1") );
resArr.push( getRegExp( "1"+"0".repeat( rangeRight.length-1 ), rangeRight ) );
return "(?:"+resArr.join("|")+")";
};
function clearDisplay() {
$( "#result" ).html( "" );
$( "#test" ).hide();
};
#test { display:none }
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<BODY>
<INPUT id="rangeLeft" value=1234 /> - <INPUT id="rangeRight" value=4567 />
<BR/>
<BUTTON id="run">Генерировать</BUTTON>
<PRE id="result" ></PRE>
<BUTTON id="test">Проверить</BUTTON>
</BODY>
Existing deficiency
For the range 10-29
etc. redundant expression will be generated (?:2[0-9]|1[0-9])