General Category > Bugs

false 100% similarity

(1/5) > >>

hsei:
The program seems to scan for similarity only about one minute at the beginning. Even if two songs differ in length by minutes, they give a 100% score if they start the same (e.g. a life CD vs. the first track).
This may be unavoidable because of performance but it is very dangerous if you rely on automark: one of the two is deleted even though they differed greatly.
Even worse: If there is a crippled track with a missing piece in the middle, it will be ranked as 100% similar and in worst case the complete track is deleted and the damaged one remains.
A configurable limit of allowable track length difference (file size would be another topic) would be very nice. At least there should be a warning if track times differ considerably (by e.g. a red color of the duration entry). The loss in performance for that should be neglible.

Admin:
Similarity designed for scaning music compositions and yes it's scans only 1 min of song. We think about how to solve problem with long durations.

djluckyluciano:
Hi,
i am confudsed of 70 % similarity of two titels one is an mega mix with 70 minutes
the other a short version of an song with 3 minutes...

hsei:
It's not only a problem of long durations: Having two files of e.g. 2 minutes with high similarity score and differing by 10 secs is a strong indication of corruption.
I actually use that for identifying corrupted files but at the moment it has to be done "manually" by looking for significant duration mismatches in high score groups.

FtMgAl:
The first time I used the program I selected a small folder with about 100 tracks that I knew had no or not more than a couple of duplicates. The program found 22 supposed duplicates. The reason is these were mostly live performances and the first minute contained much applause.

I would suggest adding a criteria that the length must match within X%. If 2 tracks differ in length by more than 25% I find it hard to believe anyone would consider that similar but with a 0-100% option even people who would could have that option.  And, as someone else mentioned, eliminating duplicates by track length could significantly improve speed.

You might also want to consider using the second minute to reduce the false positives on live tracks.

Navigation

[0] Message Index

[#] Next page

Reply

Go to full version