Author Topic: Performcence issues (Read 31196 times)

Ph0X · « **on:** May 13, 2010, 17:48:13 »

I guess this is normal, since I'm thinking that it's comparing new songs to all the previous ones, but this is seriously getting exponentially slow.

I'm scanning ~150gb of music, and it started rather fast, getting to 50% in an hour or two. Then 50 to 60 took around 2hours, 60 to 70 took around 4hours, and overnight, it did 5% in around 10hours...

Problem is, it says that it has found 200k duplicates, with my 15k songs... The filter definitely has a problem letting any song as duplicate, and having so much in memory is just killing it.

I do have experimental method enabled though.
http://img692.imageshack.us/img692/852/similarity2010051316475.png

EDIT: Gah, failed thread name.

Admin · « **Reply #1 on:** May 13, 2010, 20:21:29 »

year, we know, we try to set it fast, but it still needs compare everything with everyone.

Admin · « **Reply #2 on:** May 13, 2010, 20:24:50 »

New version of Similarity coming soon with new 'precise' algorithm, this is current version of experemental algorithm, but working fast as current content based algorithm, but it's still lack from problem >10000 songs.

Ph0X · « **Reply #3 on:** May 16, 2010, 23:00:12 »

What if you put some options to limit the songs that will undergo the full on content scan.

As in, only scan for content if tag matching is atleast X%, or scan for content if duration difference is less than X%.

Overall, it would be nice to have some options that would make the scanning less precise but faster. For example, if you set minimum to 80% match, and during the scan it already finds 20% that doesn't match, it can abort right there and not care about finding the exact value.

Of course, I might be talking out of my ass, but just throwing some ideas out there.
Can't wait to see how that new version is gonna be though. Seems like it's going to have a lot of great features.

hsei · « **Reply #4 on:** July 11, 2010, 12:00:17 »

My recommendation is to put more emphasis on *duration*. To look for music that starts the same but is different in length is maybe a nice feature but of no interest for 99% of users searching typically for real duplicates. By leaving that feature e.g. as an option and concentrating on files with approximatly the same duration would dramatically reduce search space and boost performance for larger collections. There's no additional cost since you compute track duration anyway and that's only done once per track -> O(N) and not per comparison -> O(N square).

Similarity - Home

Author Topic: Performcence issues (Read 31196 times)

Ph0X

Performcence issues

Admin

Performcence issues

Admin

Performcence issues

Ph0X

Performcence issues

hsei

Performcence issues

Quick Reply