I wanted to compare two very large text files, one with 5M rows and another with 10M, I was seeking for the lines that where present in both files.
First of all, both files must be sorted, and the you can use the comm command.
# sort a.txt > sort_a.txt # sort b.txt > sort_b.txt # comm -1 -2 a.txt b.txt > intersect.txt
comm takes the name of two files and returns three columns, one with lines only in the first file, another with the lines in the second and a third with lines in both files. You can switch off the columns with the -1, -2 and -3 parameters.