Patrick Useldinger wrote:
Brent Frère
wrote:
Sorry about that. Just my style. Not
sarcastic at all. I even think we go in an interesting and constructive
direction (big kiss :-) ). I even need this for a customer project (the
one that is described in page 20 of the February LuxBox issue)
Good, you can give me your feedback then, I'll try to incorporate as
much as possible.
When a filesystem is periodically replicated
(not actually synchronised despide RSYNC name might suggest)
with a previous version of it, the efficiency of RSYNC is fantastic:
accelerations of up to 2000 times compared to a full stupid copy. Even
if parts of a file have been changed, even if some data have been added
at the end, even if some data have been added at the beginning
of the file, the RSYNC tool takes the existing data already available
at the destination side to save unecessary bandwidth in the replication
process (a kind of 'refreshment' of the files instead of plain
copying). But there is a lack in RSYNC: if a file is renamed, its
entire content will be retransmitted to the destination at the next
RSYNC replication, and not just a file renaming, because it appears
that RSYNC is not able to detect that the file is actually the same,
just having a different filename.
Worst even is the case of folders: if you rename a folder, all the
files and subfolders will have to be fully copied during next
replication, because RSYNC will not notice the change. It takes the
entire tree structure as new, and will delete the old one on the
target. What I wish is to be able to detect this condition, by
detecting files and folders that dissepeared since last replication,
and files and folders that appeared since lase replication, and so
instead of deleting the files and folders no more present on the source
side, try to move it or rename it the the appropriate location, before
running the genuine RSYNC protocol, saving large amount of bandwidth in
this case.
I hope it is clear. The point here is that I don't want to have to
transmit the entire file content on the network to perform the actual
comparaison: the idea is to save bandwidth. So, I can consider files as
identical if they share the same md5sum. I have no reason to do better:
RSYNC does the job very well. If a file that appeared on the source has
the same checksum as a file on the target that dissepeared on the
source, I'll just rename it to the new candidate file name. If a
folder, newly appeared on the source, contains lots of such files
(having the same checksum has files belonging to a folder on the target
side that dissepeared in the source, I'll rename the folder...
You see the point ?
--
Brent Frère
Private e-mail: Brent@BFrere.net
Postal address: 5, rue de Mamer
L-8280 Kehlen
Grand-Duchy of Luxembourg
European Union
Mobile: +352-021/29.05.98
Fax: +352-26.30.05.96
Home: +352-307.341
URL: http://BFrere.net
This e-mail signature can be checked if you have the CaCERT certificate installed.
Check http://www.CaCERT.org for details.