Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - hsei

Pages: [1] 2 3 ... 5
Bugs / Global optimization does not work for huge collection
« on: December 29, 2017, 20:44:20 »
I tried to compare about 3000 files against a huge collection of 250 K songs in two groups, having created before a cache.dat of 4.5 GB.
Global optimization creates another data.dat of 20 GB (+index.dat, links.dat), found no duplicates in 12 hours with almost 100% completed and finally crashed, removing the disk (SSD) with the data from the file system.
The disk could only be recognized again by cold boot.

Doing the same without global optimization finished successfully in 4 hours, finding about 200 duplicates.

General / Re: Duration inconsistencies
« on: November 23, 2016, 22:15:09 »
A good tool for editing MP3s (e.g. removing silence) without recoding is MP3DirectCut.

General / Re: WHY is Cleaning up a Music Library so D@MN Difficult?
« on: September 02, 2013, 16:32:03 »
Similarity does the job.

The complications come in when you have hundreds ore thousands of possible dupes.

Bugs / Re: hundreds of "decoder.exe"
« on: June 25, 2013, 09:52:09 »
To my experience 32-bit-similarity becomes unstable with 60K to 90K files.
This seems to have been solved in the 64-bit version.
If you can't move to 64 bit, you can split your search e.g. into A-L and M-Z, This misses duplicates between the two groups, but most duplicates in the tagging scheme "artist - title" are within the same starting letter. Files of type "unknown artist - unknown title" can be checked as separate group against every of the "letter groups".

General / Re: accuracy of comparison
« on: February 22, 2013, 16:15:04 »
Another note: You can set time length as additional criteria. To allow some inaccuracy in cuts at ends, set limit to 95 to 98% (depends on material). Setting this parameter also reduces compare time substantially, since files with unequal times (not the same as size) are disregarded and not compared by further algorithms. This circumvents also the problem of files which are similar at the first minute. Another advantage ist that typically different recordings (normal, instrumental, live) of the same song differ enough in time and thus are sorted out beforehand.

General / Re: Explanation of Analysis Columns
« on: February 17, 2013, 18:25:09 »
The mentioned article adresses most of the questions concerning quality. But be careful with the data displayed in the analysis columns: Clipping e.g. introduces considerably (false) high frequency contributions. Thus overamplified material seems to be better with respect to max. frequency compared to correctly sampled ones. Even the spectrograms look better, but quality isn't.

News / Re: Version 1.8.2 (build 1656) released
« on: February 12, 2013, 17:39:35 »
64-bit-version appears to be much more stable as previous 32-bit ones. It looks like I am now able to compare new candidates against my complete (huge) tree of sound files (tested with 1.8.2beta, to be confirmed with final release).

General / Re: How to run multiple instances of similarity premium?
« on: January 17, 2013, 22:55:15 »
I have a huge database, too and had to start multiple instances since similarity gets unstable with cache sizes between 50k and 100k entries. Each of the instances work on part of the data with its own smaller cache file. To achieve that you have to copy (or hard link) the similarity files to different directories (e.g. A_K and L_Z).  With the /portable switch each of the directories gets its own configuration, cache and license files. After starting the instances you have to enter the license data for each of them and you will get several premium versions.
I would recommend to start the next instance not before the cache file has been read by the previous one, since concurrent HD access really slows down both instances because of excessive head movement (but not for SSD). When the cache file has been read, CPU usage dominates disk usage. You will only get a small increase in throughput since sometimes one instance can use more CPU time while the other waits for disk I/O to complete.

General / Re: working with two groups help vip
« on: April 15, 2012, 12:59:28 »
You have to move groups in the priority list up. Then only members of group #2 should be marked automatically.

News / Re: Beta version 1.7.0
« on: March 27, 2012, 20:37:54 »
ad 2) To do this, you need a standard treeview window and the possibility to select a root directory in it.
After that you select whether all files in that tree have to be preserved or if files in that tree may be deleted.
Now you can search through your groups and mark those files that may be deleted or exclude those files from marking that should be preserved.
This can be done by a simple substring comparison of the full file paths vs. the selected root directory.
Under the assumption that folder was the top priority, the other criteria might then be applied to the remaining files.
Of course at least one file per group should be kept.
ad 4) Compression ratio should be a criteria in the priority list (if not included in score calculation).
Size is an indicator for that, too, but only if resolution is identical.

News / Re: Beta version 1.7.0
« on: March 23, 2012, 19:06:30 »
The new Feature "image duplicates" is quite nice even in the current beta stadium.
Some remarks:
- sorting by click on row header does not work
- folder list for deletion priority is inadequate: When you have folder trees (as usual), resorting half of the list is much too tedious. A click on a directory with the option a) keep all files in tree b) remove duplicates in this tree would be sufficient. To be effective, parent directories of (more than one) subdirectories with duplicates should be listed, too.
- the new group list feature with the ability to expand/collapse may be useful in some cases, but the default view should be "expanded" because I want to know what is in the group and e.g. has been marked automatically.
- to list all duplicate groups may be useful in some cases since similarity is not necessarily transitive, but only if there are more than two duplicates. If there are only two duplicates it is sufficient to list them only once (A <sim> B => B <sim> A)
- a similarity score of almost 100% should not be given if images are "almost identical" but have very different size. At least for JPEG images, size is a strong indicator of quality (compression ratio). Sometimes it may be an option to delete the larger image with the same origin, but not without any warning.

Wishlist / Re: Moving marked files
« on: March 08, 2012, 22:00:25 »
Shouldn't be too difficult. Instead of saving D:\Dup\Track01.mp3 you have to "subtract" the source root path (here D:\Songs) from full path and append the rest to the target path D:\Dup\Songs. Three string processing calls in Java or C would do the job. It's a standard procedure when moving trees.

Wishlist / Re: Saving the scanned file list
« on: March 07, 2012, 19:34:03 »
Good idea!
Most of work is already implemented by cache files. Task can be achieved in principle by having different clones of similarity with own cache files started by /portable and moving cache files outside of program. But a "save as / load from" feature within similarity would be a real timesaver.

General / Re: opencl options greyed out
« on: October 08, 2011, 12:41:02 »
Current version is 11.8, but that may change within weeks.

I suppose that will be a useful feature. Brute force comparison (everything with everything) is not feasible with large amounts of files. Reducing search space by thresholds like length was a first improvement and introducing the group vs. group feature brought large amounts to acceptable times.
The possibility of further restriction of fingerprint comparison to files only with sufficient coincidence in tags introduced in latest version 1.6.2 brought down processing times for me from hours to minutes. Of course you lose duplicates which are totally mistagged, but this can often be tolerated. A more exhaustive search can be done nevertheless later on.
I prefer software that can be tailored to ones needs by configuration.

Pages: [1] 2 3 ... 5