Find corrupted images

So I deleted all my pictures, and I restored them which resulted in a bunch of corrupted images; thousands of corrupted images. To fix this, I wrote the following script in MATLAB using the image processing toolbox:

// insert blog here

Using matlab I tried to determine what a corrupted image is. First when using the image processing toolbox to open an image, I noticed:

g = imread(s);
Warning: JPEG library error (8 bit), "Corrupt JPEG data: premature end of data segment"." 
Warning: JPEG library error (8 bit), "Invalid JPEG file structure: two SOI markers"."

Also, a histogram of such a file looked liked this:

histogram

So, the only challenge is to find the spike, or simply a crazy high percent of 128, the mean value. Simple enough.

cd('/media/95543211-fd8f-4fc9-9b24-3a787113e4c2/+JPEG');
jpegs = dir('.');

num_files = 100;

file_count = length(jpegs);

G = zeros(1,num_files-2);

for i = 3:(num_files+2)
    name = jpegs(i).name;
    disp(['working: ' name]);
    if true
        try
         I = rgb2gray(imread(name));
         [w, l] = size(I);
         gray_percent = sum(sum(I==128))/(w*l);
         G(i-2) = gray_percent;
         if gray_percent > 0.07
           disp(['moving . . . ' name]);
           movefile(name, ['too_much_gray/' name]);
         else
           disp(['good: ' name]);
           movefile(name, ['noerr/' name]);
         end
        catch
         disp(['bad: ' name]);
        end

    end
end

Then a script to see which images might be corrupt:

And a ruby script to move the results (yeah — really inefficient, I know).

So files that might crash matlab are at least removed.

#!/bin/bash

for f in *
do
  # echo "Processing $f file..."
  # take action on each file. $f store current file name
  if ! identify "$f" &> /dev/null; then
     echo "$f"
  fi  
done

and (yes, this is silly)

#!/home/bonhoeffer/.rvm/rubies/ruby-1.9.3-p286/bin/ruby
filez = <<EOF
__003999
__026328
__029322
__032335
__035823
__035842
__036090
__038688
__039670
__048554
__048561
__048634
19991215_22_43_43_033877
19991215_22_43_43_034820
19991215_22_43_43_049844
19991215_22_43_56_038011
19991215_22_44_16_010202
20070729_14_42_57_048540
EOF

puts filez.split(' ').size

filez.split(' ').each do |f|
	puts "mv #{f} matlab_bad/#{f}"
	`mv #{f} matlab_bad/#{f}`
end

Leave a Reply