On a more constructive level, I have written a first version of the
program. It's only slightly optimised but appears to be quite fast
already. It will also print stats at the end which show the potential
gain as well as the number of bytes read vs. the total number of bytes
in the potentially identical files (i.e. having the same length).
Maybe it's clearer now what my intention was.
You can find it at:
http://www.homepages.lu/pu/finddups.txt (the
webserver at vo would not serve finddups.py).
Everybody, please feel free to try it and to give me your feedback. Of
course, I would not recommend running it on a production machine yet, of
course.
Thanks.