linux – Regular expressions with 'grep'

Question:

I need to extract data from a text and am trying to do this using grep. But the way to use regular expressions with this command is quite different from what is usually done in Ruby or JavaScript, and I'm not getting what I need to do. In the following text:

Judicial Record of the Regional Labor Court of the 1st Region

ELECTRONIC JOURNAL OF LABOR JUSTICE JUDICIAL POWER

No.1697/2015

FEDERATIVE REPUBLIC OF BRAZIL

Release date: Wednesday, April 1, 2015.

Regional Labor Court of the 1st Region

I only need to get the number that can be seen on the third line. This number will later be used to make a request to a webservice. Tried with grep as follows:

pdftotext Diario_1697_1_1_4_2015.pdf -f 1 -l 1 - | grep -o /Nº(\d+\/\d+)/

I take the first page of a pdf file, convert it into txt and pass it to the grep command to extract the information. But that doesn't work at all. Does anyone know how to correctly do this with grep or some other bash command?

Answer:

First, grep is a shell command and its arguments are simple strings like any others. Instead of enclosing the regex with / you should use single quotes (or use double quotes if you are careful with shell variable expansion). Also, you need to escape your backslashes with \\ .

Second, grep's default regex syntax is a little different and quite weak. For example, she doesn't understand the + , only the * . You can switch to Perl syntax with the -P flag

grep -P -o 'Nº\\d+/\\d+'

or use the POSIX syntax with grep -E or egrep .

grep -E -o 'Nº[[:digit:]]+/[[:digit:]]+'
grep -E -o 'Nº[0-9]+/[0-9]+'
Scroll to Top