Author Topic: Many duplicates are not found, many are false  (Read 24645 times)

aybiss

  • Jr. Member
  • **
  • Posts: 2
    • View Profile
Many duplicates are not found, many are false
« on: January 09, 2012, 05:51:10 »
I love Similarity and it has already helped me to save tons of drive space and time spent sorting things out.

I find that it just doesn't quite come up to scratch technically at times (read: not enough for me to buy it, yet). For instance, it will claim that different episodes of a TV show are the same, just because (I'm guessing) they have the same theme song and the same character's voices.

Far more annoyingly though, it will compare two versions of an album (say, one from iTunes and one made personally from CD) and only highlight a few of the songs as dupes. In the case I'm looking at now, only 6 out of 10 songs were identified, even with the 'content' slider at 80% and all other checks disabled. Those that are identified are apparently all >99% similar.

Subjectively they sound identical to me but they are not identified as duplicates. Loading them up in Goldwave I can see there is a delay between each version, however some of the correctly identified files also have as much or more delay.

Is there something I'm doing wrong or is the heuristic just a bit sloppy?

Or worse, is the free version so crippled it makes the program seem dodgy when in fact we all know it (potentially) isn't???
« Last Edit: January 09, 2012, 06:04:31 by aybiss »

aybiss

  • Jr. Member
  • **
  • Posts: 2
    • View Profile
Re: Many duplicates are not found, many are false
« Reply #1 on: January 10, 2012, 09:30:52 »
PS I am willing to provide example files by whatever means is generally used.

Admin

  • Administrator
  • Hero Member
  • *****
  • Posts: 664
    • View Profile
    • https://www.smilarityapp.com
Re: Many duplicates are not found, many are false
« Reply #2 on: January 12, 2012, 23:38:03 »
all fingerprint algorithms has limited scan duration, ie only 30-60 sec of files is scanned. The 3rd "precise" gives much better results as content comparing algorithm.
We working on user defined algorithm it based on "pecise" algorithm but can be configured to scan special pats of file or full file.