I've been cleaning out my NAS these past few days and it came to me that a duplicate search feature would be a great asset for me. I suggested the idea to the NAS manufacturer but they're being mute on the subject.
I searched for a few Windows programs which would work, but then I realized that at 30MB/s it's going to take days and days to read, transfer and compare 5-6 terabytes (2x3 TB HDD).
I wonder if there's a way to find duplicate files directly on the server, a script maybe, rather than use a Windows/Linux based program which would force each file to be read from the server as it scans for dupes.
I've thought about listing all the files and then sorting by size. But that isn't an easy solution. Plus I don't know much about Linux command line to do all that.
A quick way would be to make a list of all files on the NAS shares, sort by size as they are added to the list, and if a size already exists for whatever reason, then copy the 2 file names to a 2nd list that contains only duplicates by size, and use that list to display to the user. Then the user can have the program do a byte comparison locally on the server of the files with same size to confirm they are dupes. I read up on the Linux
cmp
command, it can do that for us remotely on the server, if the server has such a command in its code of course.
Exemple:
cmp /share/HDA_DATA/hdd1/123.mp4 /share/HDB_DATA/hdd2/123.mp4
Then display true duplicates in an interface for management, delete or view/open the file or its folder for us to handle the file manually.
If there is such a way it would be very helpful for disk space management, it would really take the sting out of scanning so many terabytes.