Judicial Record of the Regional Labor Court of the 1st Region
ELECTRONIC JOURNAL OF LABOR JUSTICE JUDICIAL POWER
FEDERATIVE REPUBLIC OF BRAZIL
Release date: Wednesday, April 1, 2015.
Regional Labor Court of the 1st Region
I only need to get the number that can be seen on the third line. This number will later be used to make a request to a webservice. Tried with grep as follows:
pdftotext Diario_1697_1_1_4_2015.pdf -f 1 -l 1 - | grep -o /Nº(\d+\/\d+)/
I take the first page of a pdf file, convert it into txt and pass it to the grep command to extract the information. But that doesn't work at all. Does anyone know how to correctly do this with grep or some other bash command?
First, grep is a shell command and its arguments are simple strings like any others. Instead of enclosing the regex with
/ you should use single quotes (or use double quotes if you are careful with shell variable expansion). Also, you need to escape your backslashes with
Second, grep's default regex syntax is a little different and quite weak. For example, she doesn't understand the
+ , only the
* . You can switch to Perl syntax with the
grep -P -o 'Nº\\d+/\\d+'
or use the POSIX syntax with
grep -E or
grep -E -o 'Nº[[:digit:]]+/[[:digit:]]+' grep -E -o 'Nº[0-9]+/[0-9]+'