On Mon, Jan 16, 2012 at 12:58:13PM -0800, Kamal Mostafa wrote: > * Package name : duff > * URL : http://duff.sourceforge.net/ A quick speed comparison: real user system max RSS elapsed cmd (s) (s) (s) (KiB) (s) 3.2 2.4 5.8 62784 5.8 hardlink --dry-run files > /dev/null 1.1 0.4 1.6 15424 1.6 rdfind files > /dev/null 1.9 0.2 2.2 9904 2.2 duff-0.5/src/duff -r files > /dev/null rdfind seems to be quickest one, but duff compares well with hardlink, which (see http://liw.fi/dupfiles/) was the fastest one I knew of in Debian so far. This was done using my benchmark-cmd utility in my extrautils collection (not in Debian): http://liw.fi/extrautils/ for source. The exact command to generate the above table: benchmark-cmd \ --setup='genbackupdata --create=100m files' \ --setup='cp -a files/0 files/copy' \ --cleanup='rm -rf files' \ --verbose \ --command='hardlink --dry-run files > /dev/null' \ --command='rdfind files > /dev/null' \ --command='duff-0.5/src/duff -r files > /dev/null' Personally, I would be wary of using checksums for file comparisons, since comparing files byte-by-byte isn't slow (you only need to do it to files that are identical in size, and you need to read all the files anyway). I also think we've now got enough of duplicate file finders in Debian that it's time to consider whether we need so many. It's too bad they all have incompatible command line syntaxes, or it would be possible to drop some. (We should accept a new one if it is better than the existing ones, of course. Evidence required.) -- Freedom-based blog/wiki/web hosting: http://www.branchable.com/
Attachment:
signature.asc
Description: Digital signature