Author Topic: Duplicate files presentation revamp  (Read 11596 times)


  • Jr. Member
  • **
  • Posts: 1
    • View Profile
Duplicate files presentation revamp
« on: July 07, 2013, 10:33:00 »
Tested Similarity version 1.8.4 (not registered) with a setup of duplicate imagefiles I made myself in order to evaluate the software and these are my opinions.
(English isn't my native tounge so blame our educational system and not me, for any grammatical errors. Please consider any confusing expression as a puzzle that's solved by finding the most logical and clever idea possible.)

Major "designflaws" - Results (images):

* The UI showing image duplicates is confusing due to one main reason. It shows the 2 similar files A and B twice. First as A<->B and then as B<->A.
Rule number 1 when making a UI:  Never show the same thing twice. Never! There is only one file on disk, so it should only be shown once in the presentation to the user. (There is no exception to this rule if you want to make something fast/intuitive for humans, and Aircrafts have 2nd opinion meters for a whole different reason.)

Suggestion: Remove +/- all together and just list each group of somewhat close duplicates. Select one image in each group to act as Original and calculate percentage from there. Thus each file will only appear once.

If file A-B-C are similar enough to be in same group, just bunch them together.
If file A-C are so different they should not be in same group, and B is similar to both, just group it with the "original" that gives it highest combined percentage score. If still tied, go with anything, as long as it ends up being the same between pressing "find duplicates".  Yes I know the grouping with tons of duplicates becomes a mathematical problem but it's better to have one programmer work for days than thousand of users work for 1 hour each. :-)

How to select original: This is the tricky part and depends on the user. So each user has to build their own profile on how to prioritize images. For each value you can select one of 3 bulletmarks and label it as "original - ignore - copy". User builds a preference of several attributes that are either counted or ignored and most points get selected original.
Add new menu between Tools and Help "Selection criteria" and fill it with "images" and expanded menu on the side containing "default" and "new..." for changing the default one or adding more selection profiles.  Swapping between criterias will cause the current results to have to be recalculated and new originals choosen.

Attributes to give points +10/0/-10 towards "Original/ignored/Copy" suggested attributes and default setting to get some values to help software select before the user does:
   * A larger filesize - (defaults to "original")
   * A larger height of image - (defaults to "original")
   * A larger width of image - (defaults to "original")
   * A longer name of file - (defaults to "original")
   * A longer path to file - (defaults to "ignored")
   * A newer creation date   - (defaults to "ignored")
   * A newer modification date - (defaults to "original")
   * A bigger colorpalette (spelling?) - (defaults to "original") Actual numbers of colors used from some histogram would be nifty if some compare algorith already calculated it. Otherwise we don't want to slow down selection phase.

* List of preferred formats: JPG/BMP/PNG/etc (user gets to move them up/down between lines/boxes to highly preferred (+20), preferred (+10), neutral (+0), unfavored (-10). Can have most default to neutral. GIF/BMP being unfavored due to limitations and filesize?

* The header of file (EXIF, ICC etc) and values within there can obviously be part of a users wishes but personally I delete all such headers (using jhead) so dont know what could be useful.

* In the main selection criteria when adding folders, you should be able to prioritize folders depending on how much you sorted your images. Example: A user add 4 folders, and the plus denotes priority to everything in that folder and below, until it a new priority replaces it. A subfolder can have a lower/higher score.

+200 --- D:\Data\Pictures\Wedding\Guests\Smith
+0 --- D:\Data\Pictures\Unsorted from the camera card\
+10 --- D:\Data\Pictures\Wedding\
-10 --- H:\

The +200 in this case means the user "hardcoded" that folder to be a master folder. Basically means that all those images are considered original and all other places has copies. In case user may choose more than one, like Guests\Brown besides Smith, and duplicate is found between those 2 "master folders", it can either ignore it (user may want to sort 2 identical files in separate folders for sending to 2 different friends) or change color of line as warning that two originals been found.
No "Autoselect" should ever mark something within a master folder, so user has to unmark it as such, or delete it outside the program (from explorer).

* List of preferred paths - Like our first path selection but here it's added from start unless user states something else when he first adds folders. For example documents and settings with subfolders, a user can set to "copy" (-1) because that's where most software dumps it before I can copy it to another disk/place.

All of the above is what I consider "simple preferences". In advanced mode there can be +20, +5, -12 for each value depending if user wants additonal prioritizing but options like that should not be seen unless the user specifically hunts down advanced buttons or edit preference files in order to make the software easy and intuitive at first. Future versions might have even more complicated formulas for what pixelsize/filesize or length of filename is considered "optimal" and how deviations should be weighted. Depends on what people want?

When Results are shown, possibility to rightclick a file, and select "selection score" to show what attributes gave it what modifiers, will help the user to learn how the classification works.

Some explanation on selection criteria:
Most people add words to a file to describe it, or sort it down in a longer path. As a former sysadmin it's a pain since long paths and filenames doesn't really work in Windows but that's how users work, and hence it should be possible to base priority on it.

Cropped images (and thus meant to replace original) may be smaller width/height but atleast we claim to select original and not "processed file to keep". Same with

Copying files changes creation date, but modification stays the same etc. Might want to enlighten users about quirks like that if there is some "help bubble" or little "help window" to further explain each setting as you highlight them.

Note that the +200 value given above, can be interpreted as anything above +150 should be considered "master" and undeletable. In case some user figures all his RAW format or whatever should never be deleted with some fast simple clicking of autoselects.

Minor designchanges:
* In first window, tab "folders", there is this huge Explorer like UI when all I really is interested in is what folders I currently selected and add/remove from there.
Suggestion: Explorer UI from 1990 (pre-Windows95) on the left side for those who prefer it and used to find what they want there, and to the right "selected folders".
In order to add folders you drag and drop it there. Removing folders you just mark one of them and press delete/backspace or right-click and choose "remove from selection" which should make it obvious it's not the folder you delete but part of search criteria.
Since we now got plenty of space on screen, maybe it can in the background calculate total files in selected folder including subfolders and show in separate columns after each folder amount of images/sound about to be compared? A number that is also part of number above could be in italic or parenthesis so D:\Pic shows 1000 images, and D:\Pic\some shows 100 in italic as it's also included in folder above.


  • Administrator
  • Hero Member
  • *****
  • Posts: 624
    • View Profile
Re: Duplicate files presentation revamp
« Reply #1 on: August 15, 2013, 20:16:15 »
Thanks for you suggestion.

1. About grouping, Similarity already group files by similarity you don't need expand each file in group to see it pairs, group differentiated by background color and number on most left. Next version will have "plain group" mode without any grouping by Similarity and expanding.

2. Auto-mark feature already have many of your criteries.

3. About first column, we think about you suggestions, and add drag'n'drop feature.