General Category > General

Similarity seems not optimized for 300,000+ mp3 files

<< < (2/3) > >>

Thanks for the reply. I really appreciate that.

But isn't the cache supposed to minimize the time for a second scan? (That is exactly as the first scan)

As mentioned before, I did some test doing the same exact comparison, the first time the cache increased because it added new files, but the second time it took the same time as when the cache was created.

Yes, indeed cache helps to skip decoding and preproceesing file procedure, but the time for this procedure is linear, ie calculated only once per file.
Here example, just pretend we have more realistic very fast computer what can prepare 10 caches in 1 sec and compare 100K fingerprints in 1sec.
N[(N+1)*N/2 / 1000000 / 3600][N * 10 / 3600]% preparing time10000 files0,14 hours0,28 hours66,66 %100000 files13,89 hours2,78 hours16,67 %300000 files125,00 hours8,33 hours6,25 %1000000 files1388,90 hours27,78 hours1,96 %You see for larger files amount caching importance is decreasing.
This calculation is idealistic without duration skip mechanism (disabled).

Thanks for the explanation!

Will look forward for the next update and also for the next algorithm implementation.


I do also often use grouping with a similar usecase than described at the beginning:

As soon as you use grouping not every item needs to be compared with all others but only each item in group e.g. 1 needs to be compared with group 2.
=> it should be linear, isn't it?

A further improvement might be: as soon one match is found (often that meand there is one song already double) further comparing could be stopped for that items as it is not neccessary to know that there are more than one duplicates...

Also I notice that count goes up to number itema Group 1+2. Group 2 will be the one that can be deleted with automarked files than the count should only go up to number items of group 2?!

Best, Fred


[0] Message Index

[#] Next page

[*] Previous page


Go to full version