Family Photo Management

Post still in work

Our family pictures were scattered over several computers, each with a unique photo management application. In an effort to get a good backup in place. I moved all my pictures to one computer where I accidentally deleted them. (Long story.) I was able to recover them all, but I had numerous duplicates and huge amounts of other image junk. To make matters much more complicated. I accidentally moved all files into one directory with super-long file names that represented their paths. (Another long story.) Yes, I should have built backups. Lesson learned. In any case, while scripting super-powers can sometimes get you into trouble, the only way to get out of them is with some good scripts.

We have decided to use Lightroom on Windows as a photo-management application. Our windows machines have a huge amount of storage that we can build out quickly with cheap hard drives. However, you can imagine one problem I have to solve is to eliminate a huge amount of duplicate images at different size, and to get rid of junk images.

Removing duplicates

I wrote the following code in Matlab to find duplicates and bin very dark images. It scans the directory for all images, reduces their size, computes an image histogram, which it then wraps into 16 sections, that are summed and normalized. I then run a 2-d correlation coefficient on each possible combination.

r = \frac{
\sum_m \sum_n \left(A_{mn} – \hat A \right) \left(B_{mn} – \hat B \right)
\sum_m \sum_n \left(A_{mn} – \hat A \right)^2
\sum_m \sum_n \left(B_{mn} – \hat B \right)^2

The result are comparisons such as this one.


And a histogram of the correlation coefficients shows a high degree of correlation in general.


My goal is to use this to keep the biggest photo and put the others in a directory of the same name. More to come, after I get some help.

One thought on “Family Photo Management

  1. But that photo in the example is not technically a duplicate. You’re looking in two different directions. Are you sure you want to delete one of them? Anyway, just commenting ’cause I have a similar problem (pictures across several computers/phones/tablets). What I do is simply copy all photos from each computers to a NAS, and then backup the NAS to a mini hard drive (so in the end, I have two full pictures repository, one in the NAS and one in the portable drive), then I free up the space in the computers/iphones/tablets by deleting the picture files. All of this could be automated via script, except I haven’t done it (lazy, busy, whateverhaveyou).

    But for sure, I don’t muck around with the naming. If you just keep the usual naming convention of your camera (a number which represents simply the date and time), then when you copy to other places you can direct your copying application to not copy duplicates (it does this automatically in Windows, for instance, when you do a copy/move from one directory to the next) and no two photos taken at different instants ever have the same filename. (shrug)

    What I would like to see is an automated photo content classifier. Now THAT would be a cool Matlab project. Basically, run a correlator over your face and the face of your kids or whatever, and Matlab tells you if there’s a “high content” of you or someone else, then it promptly labels the photo as a “Tim Photo” or “Wife Photo” or “Kid photo”. Then you can easily search your photos by subject and date. (shrug).

    Hi from Switzerland!

    -Elisa ’97.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.