bash – diff changing lines

Question:

I'm comparing two files, which are updated daily, with the diff -y command in order to get two results:

The first is the lines that were changed overnight:

grupoAzul;Gabriel;04-maçãs;02-limões       |    grupoAzul;Gabriel;05-maçãs;02-limões
grupoAzul;Amanda;03-maçãs;05-limões             grupoAzul;Amanda;03-maçãs;05-limões

To do this, I use the command diff -y arquivoAntigo.csv arquivoNovo.csv | grep -e "|"

The second is the new lines:

grupoAzul;Gabriel;04-maçãs;02-limões       |    grupoAzul;Gabriel;05-maçãs;02-limões
grupoAzul;Amanda;03-maçãs;05-limões             grupoAzul;Amanda;03-maçãs;05-limões
                                           >    grupoAzul;Kratos;04-maçãs;00-limões

For this result the command diff -y arquivoAntigo.csv arquivoNovo.csv | grep -e">" is used.

That explained, let's go to the error

When a new line appears on top of the modified line, diff 'pushes' the modified line down and considers it as the new line and what was to be the new line it considers as the modified line.

grupoAzul;Gabriel;04-maçãs;02-limões       |    grupoAzul;Kratos;04-maçãs;00-limões
                                           >    grupoAzul;Gabriel;05-maçãs;02-limões
grupoAzul;Amanda;03-maçãs;05-limões             grupoAzul;Amanda;03-maçãs;05-limões

These events are, in fact, rare to happen but when they happen I have more than one line damaged.

What causes this bug and how can I fix it?

Answer:

The problem is caused because the same records do not appear on the same line in both files. diff compares files line by line . In the example problem you showed, line 2 of the file on the left is different from line 2 of the file on the right, so it must be marked with ">".

To avoid this circumstance, use sort so that all matching records appear on the same line in both files:

$ diff -y <(sort arquivoAntigo.csv) <(sort arquivoNovo.csv)
                                          <
grupoAzul;Amanda;03-maçãs;05-limões         grupoAzul;Amanda;03-maçãs;05-limões
grupoAzul;Gabriel;04-maçãs;02-limões      | grupoAzul;Gabriel;05-maçãs;02-limões
                                          > grupoAzul;Kratos;04-maçãs;00-limões

However, as you can see, the white space in the first file gets first place in the sort algorithm, so I also suggest removing the white lines using sed :

$ diff -y <(sort arquivoAntigo.csv | sed '/^\s*$/d') <(sort arquivoNovo.csv | sed '/^\s*$/d')
grupoAzul;Amanda;03-maçãs;05-limões         grupoAzul;Amanda;03-maçãs;05-limões
grupoAzul;Gabriel;04-maçãs;02-limões      | grupoAzul;Gabriel;05-maçãs;02-limões
                                          > grupoAzul;Kratos;04-maçãs;00-limões

The regular expression used in sed ( /^\s*$/ ) searches for all lines that contain zero or more blank characters, such as spaces and tabs, and deletes them with the d command.

In time, the notation <( ... ) , in bash is for the command enclosed in parentheses to be previously executed in a subshell . So, when running the diff above, the sort ... | sed ... are executed and return already handled temporary files for comparison via diff .

To see it working online on tutorialspoint, with the exception that it doesn't seem to be possible to create files there, so I had to use variables to "simulate" them: http://tpcg.io/aO9pny

Scroll to Top