Having been so meticulous about taking back ups, I’ve perhaps not as been as careful about where I stored them, so I now have a loads of duplicate files in various places. I;ve tried various tools fdupes, czawka etc. , but none seems to do what I want… I need a tool that I can tell which folder (and subfolders) is the source of truth, and to look for anything else, anywhere else that’s a duplicate, and give me an option to move or delete. Seems simple enough, but I have found nothing that allows me to do that… Does anyone know of anything ?

  • speculatrix@alien.topB
    link
    fedilink
    English
    arrow-up
    2
    ·
    10 months ago

    Write a simple script which iterates over the files and generates a hash list, with the hash in the first column.

    find . -type f -exec md5sum {} ; >> /tmp/foo

    Repeat for the backup files.

    Then make a third file by concatenating the two, sort that file, and run “uniq -d”. The output will tell you the duplicated files.

    You can take the output of uniq and de-duplicate.

    • parkercp@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Thanks @speculatrix - I wish I had your confidence in scripting - hence I’m hoping to find something that does all that clever stuff for me… The key thing for me is to say something like multimedia/photos/ is the source of truth anything found elsewhere is a duplicate …

      • Digital-Chupacabra@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        I wish I had your confidence in scripting

        You know how you get it? by fucking around and finding out! I’d say give it a go!

        Do a dry run of the de-dup to make sure you don’t delete anything you care about.

        • parkercp@alien.topOPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          Give me a few years and maybe :P - but for now I’d rather not risk important data with my own limited skills especially if there is a product out there that it’s tried and tested and hopefully recommended by someone in this sub… I didn’t expect my ask to be quite so unique…

    • jerwong@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I think you need a \ in front of the ;

      i.e.: find . -type f -exec md5sum {} \; >> /tmp/foo