Post reply

Warning: this topic has not been posted in for at least 120 days.
Unless you're sure you want to reply, please consider starting a new topic.

Note: this post will not display until it's been approved by a moderator.

Name:
Email:
Subject:
Message icon:

Verification:
Sum of two plus two?:

shortcuts: hit alt+s to submit/post or alt+p to preview


Topic Summary

Posted by: hsei
« on: July 11, 2010, 12:00:17 »

My recommendation is to put more emphasis on *duration*. To look for music that starts the same but is different in length is maybe a nice feature but of no interest for 99% of users searching typically for real duplicates. By leaving that feature e.g. as an option and concentrating on files with approximatly the same duration would dramatically reduce search space and boost performance for larger collections. There's no additional cost since you compute track duration anyway and that's only done once per track -> O(N) and not per comparison -> O(N square).
Posted by: Ph0X
« on: May 16, 2010, 23:00:12 »

What if you put some options to limit the songs that will undergo the full on content scan.

As in, only scan for content if tag matching is atleast X%, or scan for content if duration difference is less than X%.

Overall, it would be nice to have some options that would make the scanning less precise but faster. For example, if you set minimum to 80% match, and during the scan it already finds 20% that doesn't match, it can abort right there and not care about finding the exact value.

Of course, I might be talking out of my ass, but just throwing some ideas out there.
Can't wait to see how that new version is gonna be though. Seems like it's going to have a lot of great features.
Posted by: Admin
« on: May 13, 2010, 20:24:50 »

New version of Similarity coming soon with new 'precise' algorithm, this is current version of  experemental algorithm, but working fast as current content based algorithm, but it's still lack from problem >10000 songs.
Posted by: Admin
« on: May 13, 2010, 20:21:29 »

year, we know, we try to set it fast, but it still needs compare everything with everyone.
Posted by: Ph0X
« on: May 13, 2010, 17:48:13 »

I guess this is normal, since I'm thinking that it's comparing new songs to all the previous ones, but this is seriously getting exponentially slow.

I'm scanning ~150gb of music, and it started rather fast, getting to 50% in an hour or two. Then 50 to 60 took around 2hours, 60 to 70 took around 4hours, and overnight, it did 5% in around 10hours...

Problem is, it says that it has found 200k duplicates, with my 15k songs... The filter definitely has a problem letting any song as duplicate, and having so much in memory is just killing it.

I do have experimental method enabled though.
http://img692.imageshack.us/img692/852/similarity2010051316475.png

EDIT: Gah, failed thread name.