After searching for some mp3 duplicate finders for Linux I finally found two good applications. One of them is DuMP3. In this blog post I will walk through the steps showing screenshots and measuring the performance.
Upon launching the application you will see the following window on your screen.
A very simple and clean interface. On the left side you will have to select the files that you want to compare and add them into the right pane. In my case all my collection is in one place which I added as shown below
Once you press the "Next" button the software will scan all the media in the directories that you selected in your right pane and it will make a list of file extensions that are contained in the directories. These will be shown in the following screen where you can select the file formats that you want to compare.
Seems like the software can find duplicates of images as well. That is neat! Anyway I just want audio de-duplication so I selected mp3, ogg and flac
Pressing "Next" will take you to a screen where you can tweak the comparison algorithm. I left the default values because I did not know how to tweak them yet.
When you are done changing the parameters, just press Next and the software should start scanning your collection. The software seems to be able to use multiple threads to make fingerprints for some audio files but not all. Or so it seems. For the first 130 files or so I saw that the software was using 50% of all 8 CPUs (4 real and 4 virtual due to Intel's HT) of my core i7 processor. But then after a while a strange thing happened. The scanning switched from multi-threaded to single threaded for some reason! That was very unfortunate. This is what is happening with DuMP3 everytime I use. Why would it not use all the processors when it is capable of it? May be only some formats lend to multi-threaded processing? What formats are these? Anyway this is so slow that it is totally unusable. I just ran in on a sample of about 500 files and it took 24 hours so far, yet it could only scan 475 files.
However it seems like it has done a good job at finding duplicate files. Still I cannot use it. I wish I knew what was wrong.