How to Compare Two Files and Print Differences in Linux Using awk

Показать описание

Learn how to effectively use `awk` to compare two files, identifying differences in specified columns while managing large datasets.
---

Visit these links for original content and any more details, such as alternate solutions, latest updates/developments on topic, comments, revision history etc. For example, the original title of the Question was: Comparing five columns of two different files and printing the differences of only two columns using awk

If anything seems off to you, please feel free to write me at vlogize [AT] gmail [DOT] com.
---
Comparing Two Files with awk: A Step-by-Step Guide

Understanding the Files

Let’s first examine the two files you'll be comparing. The structure of both files is as follows:

[[See Video to Reveal this Text or Code Snippet]]

[[See Video to Reveal this Text or Code Snippet]]

Defining the Problem

Your goal is to compare these two files based on the first three columns (id, name, and place), and print any differences that exist in the fourth (cost) and fifth (member) columns. For instance, in id 1, the member status changed from yes to no, and the cost for id 3 changed from 777 to 770.

Solution Using awk

To accomplish this, you can use the awk command. awk is a powerful text processing tool that can handle file-based data efficiently.

Step-by-Step Code Explanation

The following awk command will effectively achieve your goal:

[[See Video to Reveal this Text or Code Snippet]]

Breaking down the command:

-F'|': This option sets the delimiter to the pipe (|) character.

{k=$1 FS $2 FS $3}: Creates a unique key k by concatenating the first three columns.

NR==1 {split($0,h)}: On the first line (header), it splits into an array h for later use.

NR==FNR {f4[k]=$4; f5[k]=$5; next}: For the first file, store the values of the fourth and fifth columns indexed by the key k.

f4[k]!=$4: Compares the fourth column of the second file with the stored value from the first file. If they differ, it triggers the print.

{print k, h[4] " changed from " f4[k] " to " $4}: Outputs the formatted difference.

Implementing Changes for the Fifth Column

To also compare the member status, simply replicate the logic for the fifth column:

[[See Video to Reveal this Text or Code Snippet]]

Example Output

By running the command, you can get results like:

[[See Video to Reveal this Text or Code Snippet]]

Conclusion

Using awk for comparing files and printing specific differences can save you significant time, especially with large datasets. By understanding the command breakdown and implementing the proposed method, you can now tackle file comparisons effectively. Whether for personal use or in a professional setting, mastering awk is a valuable skill for any Linux user.

Now, let’s put this knowledge into practice and streamline your data comparison tasks!