Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - lakecityransom

Pages: 1 [2]
16
Apologies for not posting on the bigger topic in general section.

My complaint about solution #1 is that accidental deletion is still possible in the matches. Consider the following based on your example:

Group#1:  3.mp3 is preferred. 1.mp3 2.mp3 and 4.mp3 chosen for deletion.
Group #2: 2.mp3 is preferred. 3.mp3 and 6.mp3 chosen for deletion.
Both groups are deleted at the same time. 3.mp3 is lost although it was meant as the saved mp3 for group #1

Solution #2 I would recommend it as an alternate option, but the drawback of similarity based on 1 file is indeed a problem. Many files will have a lesser relationship to it than if they were compared to another mp3. This means a lot of false positives. However, you do have controls on the match percentages now so it can be limited to high % matches. This would not be the perfect solution, the program would have to be run multiple times to keep trimming down results and the same results that were not taken care of will appear again. However, it can be useful to delete a good amount of duplicates in an easy and secure way on a large file set.

I can see some issues in accidental match deleting in solution 2 maybe? Unless you allow each mp3 to be on the list only once whether it be candidate or candidate match. That is, unless the file purging process is redesigned as explained below:

My suggestion: Batch deletions should be performed like deleting files from 1 file group at a time:

Currently:
--------------------
I delete 1 file group at a time: The deleted files are checked against the remaining list and all inverse relationships and matches under all groupings are purged.

I delete a bunch of files under multiple groups at once: The deleted file list is compared to the remaining file list. If the deleted file list contains inverse relationships among candidates or the issue I described for solution #1, many incorrect deletions will occur.
--------------------

So, if a user selects a bunch of files for deletion, in the code delete them 1 by 1 and compare them to the list. This will erase groupings of same matches or inverse relationships, preventing unintended deletions even if they are selected again further down the list.

17
In my opinion this is the most critical issue. It destroys the basic functionality and purpose of the program. I have also suffered because of this. Luckily, all my music is on an extra disk and I was able to recover most songs completely intact so I can figure out if they have any copies left still. Normally I would not delete them completely from the recycle bin, but my recycle bin was going crazy? Not entirely music-similarity fault...

The only safe way to use this program at the moment is to delete match groups one by one so the list will be reprocessed to purge inverse relationships. It really is important that people understand this.

18
General / Several questions about similarity
« on: May 21, 2010, 21:19:06 »
So far I can tell you its working great and has churned through ~20,000 results out of ~20,000 in about 2 hours on content 90%. I did try content precision at first but it was going much slower, probably for the reason I explained above. It looks like you are right about the false positives taking up so much processing time.

19
General / Several questions about similarity
« on: May 21, 2010, 19:28:42 »
Oh great, beta right when I need it the most. I'll have you know I've been working on my music for a year so now that I am at this step at this point in time precision comparison is a godsend.

I'll give some feedback on it. Thanks a ton.

edit: Sorry I misunderstood, I mean threshold is very useful. The precision will be useful in the future I'm sure, I have a majority as mistagged files so it is not going to help me much in comparison to other libraries.

20
General / Several questions about similarity
« on: May 21, 2010, 19:10:11 »
Oh no it was 175,000 results of duplicates, most false positives. Most of what I was interested in was 90%-100% range which would have cut that number dramatically. The file count was somewhere around 50,000-55,000. I thought the program called results from a cache file instead of recomputing, but you're saying the cache puts songs into memory for quicker comparison? From this method it would obviously not be possible to do with many songs. I must be wrong in my thinking, because computed results cannot be saved yet.

At any rate, similarity started crawling to a halt somewhere around 35,000 processed songs, (AMD 6400+ 3.2ghz dual core, 3GB free ram on XP). Of course I don't blame similarity for this. I just have to take it chunks at a time.

Was I right about how the sensitivity setting works by the way? I just wish I could tweak content matches to 90-100% would make life much easier.

21
Wishlist / delete
« on: May 21, 2010, 13:54:40 »
Meanwhile, I suppose the best solution is to streamline the process is to use a hotkey program that automates clicks and key presses to a hotkey.

Edit: If you want something as similar as possible to hitting delete key, the keyboard commands to use in whatever hotkey program you use would be:

App key to bring up right-click context menu (this key is next to right control on the keyboard http://farm3.static.flickr.com/2653/4169841579_8f114d8fbd_o.jpg)
Up key to select delete at end of context list
Enter key to choose delete

22
General / Several questions about similarity
« on: May 21, 2010, 11:02:47 »
I took some results that I had processed with the 'browse...' feature ie 20% match, 60%, 90% etc. and put them on a jumpdrive with similarity and did some sensitivity setting testing on another computer in the meantime. Different sensitivity changed percentages but I don't see the pattern... For example, if you were to say 90% sensitivity, are the percentages of similarity in relation to treating 90% song similarity counting as 100% similarity? In other words, if a song was 89% similar it would have showed up as ~99%?

I knew I was asking for trouble, but I tried to sort 175,000 matches by filename and either Similarity crashed or it was going to take ages. On the bright side, it was nice that it did not crash when sorting by content filter as I was doing. At any rate I had to start from scratch again. Its been about 2 more days and I'm nearing 175,000 matches again in the same time frame. I'm not sure how the cache helped? I was hoping the cache would quickly churn out results with 36,725 cached out of ~50,000+ but I don't see much of a time difference. No files were moved or altered in any way between the 2 runs.

Thanks for the replies and it is still a great program.

edit: I accidentally sorted by file again but it succeeded after a long wait :)

23
Many programs have the option to find the file on disk, but usually they automatically highlight the file when the folder is opened.

24
General / Several questions about similarity
« on: May 19, 2010, 09:45:23 »
First of all, thank you for a great free program.. I have 50,000+ MP3s and many, many dupes. This program is doing a great job at finding them and not crashing or eating up memory (198,400 results at 63.8% completion, 2 days run time).

1. What do the Cache: and the "New: x/x" numbers represent? For example, Cache: 35,278, New: 32,141/50,193? I am a little confused, because the completion percentage has swayed back and forth.

2. I know there is clear cache button but I don't know the consequences or purpose of doing it? Does it help the search go faster next time if I closed similarity? I know I cannot currently save my result list so that is disturbing since this is a very long number crunching process, I just hope it doesn't crash!

3. In options there is a sensitivity option to content and tags. Currently it is set at .75 each, yet on each result I have many songs with content % matches of less than 1% or something unnecessary like 20%. Are these going to be purged at the end of the result processing? If so, if you change sensitivity settings during result computation will it use the new value at the end?

Some of these questions I could probably answer through my own experimentation with a small file set, but Similarity cannot be opened twice as it cannot access the cache that is in use. I figure that if I attempt another search on the 2nd opened similarity it will erase the cache or something and mess up my 2 day+ search? I just got a little trigger happy on using the program before completely understanding it.

25
Untabbing grouped results: I compare directories because some MP3s I have are already in approved folders and they will be the ones I save when I choose between duplicates.

I understand results under the bold heading of each group are tabbed for organization and to avoid confusion. However, the directories are a major part of my decision of which file to keep, just as much as content%, tag%, artist tag, title tag, bitrate, etc. and it is easier to compare directories when they are all aligned. The alternating color scheme between the groups and the bold headings would still be satisfactory in organizing the results.

Another solution would be to have an option to give preference to certain folders as the bold results, although this is much more difficult feature to provide.

Pages: 1 [2]