Brent Frère wrote:
Do a find. For each file, compute a md5sum. Do a sort
of it. Detect the
sets of files having matching md5sum. Do a binary compare of each couple
of such files. If it matches, you found it !
I am going to write one, as I haven't found what I was looking for.
However, I haven't found a reason why I should use md5sum. It means that
I have to read each file at least once entirely to compute the hash, and
possibly twice if the hashes match.
Why not compare them directly (blockwise) if their length matches? And
stop as soon as they differ?
-pu