Question:
You need to write a command to the terminal using grep or sed. It should output only matching pieces from the text file (no difference in line or column). Perl cannot be used.
Now there is a regular expression like this
a?\s*b?\s*c?\s*d?\s*e?\s*f?\s*g?\s*h?\s*i?\s*j?\s*k?\s*l?\s*m?\s*n?\s*o?\s*p?\s*q?\s*r?\s*s?\s*t?\s*u?\s*v?\s*w?\s*x?\s*y?\s*z?\s*
Oddly enough, sequences such as "ace" or "bpxz" fall under it. How can I make the expression take into account only sequences without missing letters, such as "abcd", "opqr", "xy"?
UPD : I forgot to add that spaces are ignored (this is what I use \s* for). The regular must find an alphabetic sequence anywhere in the text. For example, from the phrase "roll call of duty officers" there should be "kl" and "dej" (in Russian it was easier to come up with an example).
Answer:
Unfortunately, you did not specify what dialect of regular expressions can be used and what it is for. Perhaps there are simpler solutions based on special features of regular expressions, or simpler means without using regular expressions.
For a PCRE compatible dialect, a similar expression is obtained (up to the letter d, continue by analogy, insert spaces to taste):
(?:a(?=b))?(?:b(?=c))?(?:c(?=d))?(?:d(?=e))?
Test on ragex101.com
From the "Special Features" of regular expressions, for example, in the perl
language, you can check any consecutive characters like this:
echo "abpade fg xyz" | perl -npe 's/.*?((?:([a-z])\s*(?=(??{chr(ord($2)+1)})))+.)/$1\n/g'
Результат:
ab
de fg
xyz
perl can be used instead of grep on most unix systems by writing the required command as a single line.
UPD For the command line, using only grep and sed, the short version is:
echo "a bcefgkmoxyz" |\
grep -Po `echo -n 'bcdefghijklmnopqrstuvwxyz' |\
sed 's/./\0\0/g;s/^/a/;s/\(.\)\(.\)/\\\\s*(?:\1(?=\\\\s*\2))?/g;s/.$/./'` |\
sed -n '/../p'
Результат:
a bc
efg
xyz
The command is divided into several lines for ease of viewing, you can put it in one line by removing the \
. I was too lazy to write a long regular expression, so the result of executing (in grep
) the echo | sed
which creates the necessary expression on the fly from the letters of the alphabet. Unfortunately, the ideal expression did not work out and grep also produces individual characters, the last line sed -n '/../p'
is used to suppress them.
The grep
parameter generated by the commands from the alphabet looks like this:
\\s*(?:a(?=\\s*b))?\\s*(?:b(?=\\s*c))?\\s*(?:c(?=\\s*d))?\\s*(?:d(?=\\s*e))?\\s*(?:e(?=\\s*f))?\\s*(?:f(?=\\s*g))?\\s*(?:g(?=\\s*h))?\\s*(?:h(?=\\s*i))?\\s*(?:i(?=\\s*j))?\\s*(?:j(?=\\s*k))?\\s*(?:k(?=\\s*l))?\\s*(?:l(?=\\s*m))?\\s*(?:m(?=\\s*n))?\\s*(?:n(?=\\s*o))?\\s*(?:o(?=\\s*p))?\\s*(?:p(?=\\s*q))?\\s*(?:q(?=\\s*r))?\\s*(?:r(?=\\s*s))?\\s*(?:s(?=\\s*t))?\\s*(?:t(?=\\s*u))?\\s*(?:u(?=\\s*v))?\\s*(?:v(?=\\s*w))?\\s*(?:w(?=\\s*x))?\\s*(?:x(?=\\s*y))?\\s*(?:y(?=\\s*z))?.