16
Wishlist / If a song is displayed as the "primary", don't show it as a "candidate match"
« on: May 26, 2010, 17:45:24 »
Apologies for not posting on the bigger topic in general section.
My complaint about solution #1 is that accidental deletion is still possible in the matches. Consider the following based on your example:
Group#1: 3.mp3 is preferred. 1.mp3 2.mp3 and 4.mp3 chosen for deletion.
Group #2: 2.mp3 is preferred. 3.mp3 and 6.mp3 chosen for deletion.
Both groups are deleted at the same time. 3.mp3 is lost although it was meant as the saved mp3 for group #1
Solution #2 I would recommend it as an alternate option, but the drawback of similarity based on 1 file is indeed a problem. Many files will have a lesser relationship to it than if they were compared to another mp3. This means a lot of false positives. However, you do have controls on the match percentages now so it can be limited to high % matches. This would not be the perfect solution, the program would have to be run multiple times to keep trimming down results and the same results that were not taken care of will appear again. However, it can be useful to delete a good amount of duplicates in an easy and secure way on a large file set.
I can see some issues in accidental match deleting in solution 2 maybe? Unless you allow each mp3 to be on the list only once whether it be candidate or candidate match. That is, unless the file purging process is redesigned as explained below:
My suggestion: Batch deletions should be performed like deleting files from 1 file group at a time:
Currently:
--------------------
I delete 1 file group at a time: The deleted files are checked against the remaining list and all inverse relationships and matches under all groupings are purged.
I delete a bunch of files under multiple groups at once: The deleted file list is compared to the remaining file list. If the deleted file list contains inverse relationships among candidates or the issue I described for solution #1, many incorrect deletions will occur.
--------------------
So, if a user selects a bunch of files for deletion, in the code delete them 1 by 1 and compare them to the list. This will erase groupings of same matches or inverse relationships, preventing unintended deletions even if they are selected again further down the list.
My complaint about solution #1 is that accidental deletion is still possible in the matches. Consider the following based on your example:
Group#1: 3.mp3 is preferred. 1.mp3 2.mp3 and 4.mp3 chosen for deletion.
Group #2: 2.mp3 is preferred. 3.mp3 and 6.mp3 chosen for deletion.
Both groups are deleted at the same time. 3.mp3 is lost although it was meant as the saved mp3 for group #1
Solution #2 I would recommend it as an alternate option, but the drawback of similarity based on 1 file is indeed a problem. Many files will have a lesser relationship to it than if they were compared to another mp3. This means a lot of false positives. However, you do have controls on the match percentages now so it can be limited to high % matches. This would not be the perfect solution, the program would have to be run multiple times to keep trimming down results and the same results that were not taken care of will appear again. However, it can be useful to delete a good amount of duplicates in an easy and secure way on a large file set.
I can see some issues in accidental match deleting in solution 2 maybe? Unless you allow each mp3 to be on the list only once whether it be candidate or candidate match. That is, unless the file purging process is redesigned as explained below:
My suggestion: Batch deletions should be performed like deleting files from 1 file group at a time:
Currently:
--------------------
I delete 1 file group at a time: The deleted files are checked against the remaining list and all inverse relationships and matches under all groupings are purged.
I delete a bunch of files under multiple groups at once: The deleted file list is compared to the remaining file list. If the deleted file list contains inverse relationships among candidates or the issue I described for solution #1, many incorrect deletions will occur.
--------------------
So, if a user selects a bunch of files for deletion, in the code delete them 1 by 1 and compare them to the list. This will erase groupings of same matches or inverse relationships, preventing unintended deletions even if they are selected again further down the list.