Similarity Forum

General Category => General => Topic started by: MeltheMighty on January 07, 2016, 21:04:16

Title: Script for deleting duplicates
Post by: MeltheMighty on January 07, 2016, 21:04:16

Just analysed my huge and extremely messy music collection and come up with 18,000 duplicates! Losing the will to live already! I have noticed that if you have several copies of the same album in your collection(as I do) the media servers I use all seem to lump them together. So for example I have 5 copies of Elephant by The White Stripes(don't ask how I don't know). So when I search media monkey for "Elephant" it lists the Album but with 5 of each song included. This means that I cannot play the Album in it's entirity without hearing 5 of each track (unless I select one of each and make a play list).

But some of the tracks on Elephant appear on other albums also, so I probably have 8 copies of some tracks. To let similarity delete all but one track(randomly or by quality) could leave gaps in some of the Albums if it chooses the wrong tracks. So, i have been working through the results of the Analysis looking at which tracks are duplicated AND from the same album and keeping ONLY ONE of the best quality tracks. A simple but tedious process.

It strikes me that this could easily be performed by a (probably) simple script. Problem is, the last time I wrote script was 35 years ago in "basic", if anyone remembers it. I have never used javascript and don't have the first idea how I would write the script to work with Similarity API.

Essentially I think the script would simply be:

IF the "Artist", "Title" and "Album" fields are the same on a given group of duplicate results THEN keep the one with the best rating AND delete the rest.

To save me doing an open university course in computing as I approach my dotage DOES ANYONE KNOW HOW TO DO THIS?

My wife already thinks I am a a specky geek for embarking on my music rationalization project(which has taken up loads of time) and will probably divorce me if I work through 18,000 duplicates in this way, as it will probably take weeks.

Take my advice don't buy multi-room speakers! They work great, but ya music collection needs to be in order!

I would be very grateful for any help or advice.
Title: Re: Script for deleting duplicates
Post by: Admin on February 29, 2016, 14:58:26
Sorry for the delay, you don't need a script, just use automark with tags algorithm > 95%. Tags algorithm exactly compares only album, artist and title.

If you need some specific tag restriction here sample script that marks only files with same album and artist, marks file by worst analysis.rating.
Code: [Select]
Author: Similarity Team
Version: 1.0
Mark files after scan only if tags restrictions satisfied. Priority by rating (you can simply change it).
Warning! Using analysis.rating forces to analyse file, if you didn't analyse files before, it take much time (because this script analyses files only on 1 cpu core).
We suggest to perform analyses before launch of this script (Analyse all context menu).

// album, artist, title - comment/uncomment string to disable/enable such restriction
// threshold - threshold for string comparing algorithm, scores between [0...1], 0 - absolutely different, 1 - absolutely same
// minlength - minimal length of text in the tag for comparing, files don't fit the criterion will be skipped
var myProperties = {
    album: { threshold: 0.9, minlength: 2 },
    artist: { threshold: 0.9, minlength: 2 },
    // title: { threshold: 0.9, minlength: 2 },

// unmark all files;

function checkRestrictions(item1, item2) {
    // enumerate all selected tag fields
    for (var property in myProperties) {
        // check each field
        var minlength = myProperties[property].minlength;
        var threshold = myProperties[property].threshold;
        var text1 =[property];
        var text2 =[property];
        // skip short tags
        if (minlength > 0 && (text1.length < minlength || text2.length < minlength)) return false;
        // check text string(tag values) similarity
        if (threshold > 0.0 && text.calculate(text1, text2) < threshold) return false;
    return true;

// simple mark by lower bitrate
var dups =;
for (var idx = 0; idx < dups.length; ++idx) {
    // skip counter-pair (1-2 and 2-1), process pair only once
    if (dups[idx].item1.path > dups[idx].item2.path) continue;
    // check for our special tag restrictions
    if (!checkRestrictions(dups[idx].item1, dups[idx].item2)) continue;
    // ok now we select by our priority
    if (dups[idx].item1.analysis.rating > dups[idx].item2.analysis.rating) dups[idx].item2.marked = true;
    else dups[idx].item1.marked = true;
Title: Script for deleting duplicates
Post by: Therusef on March 28, 2016, 09:28:38
nothing at this end. If you all want to come to a consensus on an Excel or XML format I will add the script to upload.