Welcome to the home of
durendal TWiki.Deskew. This is a web-based collaboration area for the development of software to assist in image manipulation of scanned books.
Literature
Segmentation
One of the first things we do with a raw set of scans is split them
into separate pages, and possibly extract illustrations. Here are some
papers describing some algorithms.
Skew
Our initial literature search led us to the Hough transform for detecting the
skew angle. While this worked the Hough transform is very computationally
intensive. The literature is full of mechanisms to reduce the number of points
that must be fed to the Hough transform. The W. Postl method for detecting
skew angle is much lighter.
We're starting by producing a program to deskew pages that are rotated during
the scanning process.
After a careful search of the literature and the web we chose
the
leptonica library as the best
starting point. This is the
paper
describing the technique used by the Leptonica library for skew detection.
It implements a method from W. Postl's expired
patent.
The first step is to develop a stand alone tool to do deskew. When we have a
reliable stand alone tool we want to add this to the
netpbm library of tools,
ImageMagick and
The Gimp.
This is an initial sketch, based on the
deskewtest.c provided
with the Leptonica library. It is usable if not quite robust. The underlying
library is very robust, so it should be safe to use.
deskew.c: current deskew source (bilevel/grayscale/rgb)
We want a nice smooth rotate. The cubic interpolation rotate provided by gimp
is very nice. Here is a script-fu-based program which uses Leptonica to get
the skew angle and then gimp to do the rotate. It is very slow, but provides
a nice result. We'd like to give similar functionality without the overhead
of gimp, but meanwhile, here's a usable tool...
You can see
some experiments which compare the various methods. The pure Leptonica
solutions are decidely the fastest, and produce images which are
virtually indistinguishable from the very nice gimp rotate. The
ImageMagik? (convert) deskewer has the nice property that it increases
the image size to accomodate the whole rotated image. This is nice
for very highly skewed pages which might otherwise suffer badly from
the implicit cropping.
Planned Enhancements:
- Control confidence threshold from command line.
- Automatic black level detection for gray scale and color images. (IP Jul 2005 --piggy)
- Manual black level setting.
- Multiple input/output formats. (gif, png, tif, pnm, jpg etc.)
We get most of these for free with Leptonica. --piggy
- The best rotate algorithm we can find.
The AM code in Leptonica is pretty good, but GIMP's quadratic
interpolation gives better results for some of the pages I've tried.
- Manually specify a margin (specify active area?) to ignore for the detection.
- Select BAG/Hough transform for skew detection?
- When doing the rotate the pix we are rotating into must be expanded on the borders so that we don't clip the corners.
Problematic pages
The first aberration we encountered is a number of pages that are trapezoidal
rather than rectangular. There is no single skew angle for the page. We
theorize that a skew detection on the top third of the page that differs from
the bottom third of the page will detect these. We've not tried code yet for
this.
A
screwy page with 0 confidence. A
hand-calculated rotation for this page is -4.8°. If you apply that,
you can see that the bottom half of
the page is fine, but the top
portion has a peculiar keystoning-like distortion. Greg speculates that this
is a scanner feed error.
A
sample trapezoidal page scanned
unbound with an ADF. This page has been verified as trapezoidal on the
original printed page. It seems this is fairly common in books from this era.
As of version 1.26, Leptonica can automatically detect and eliminate
trapezoidal distortion on this page. Here is the
corrected page.
A
sample image where deskew shears off the lower left corner of text.
A page with
spine distortion. The
book was not completely flat so the left column of text is noticably
compressed due to parallax. This page also amply demonstrates variation in
the grey background. My current experiments are aimed at calculating the
parallax distortion using the background grey information.
Before finding Leptonica, we started work on implementing Yu & Jain. This is
a remarkably well-written paper.
- bag.py: An implementation of Block Adjacency Graphs
- hough.py: An implementation of the Hough transform
Paper posted locally
by permission of A.K. Jain and courtesy of
eDoc.
Code for the DjVu? book format
DjVu? is a file format for sets of online images such as facsimile editions of
books. There is an Open Source implementation available
at Source Forge. This is more a
set of utilities for assembling pieces. I found it necessary to write my own
wrapper script for converting a directory full of png's into a
DjVu? book.
This is far from a robust application, and probably doesn't produce the
most compact book possible, but it is no worse than a tarball of the pngs.
- djvudrv: DjVu? driver (Bourne shell) for building a book from a directory of png's
Weird misc. code pieces
pnmcrop.margin.patch: A patch to add margin capability to pnmcrop. For this to be really useful I need to do a despeckle before determining what the image border is.
pnmcrop.margin.filter.2.patch: margin and filter patch for pnmcrop with bug fixes to the margin code. This is in
NetPBM? 10.30, but it's broken.
gifcrop: The perl script I use to drive the pnm tools to auto-crop text pages. Now works for grey scale images as well. The threshold value is hard coded though. Any one know how to auto threshold?
gifpage is a dumb perl script to wrap image files with html. Ypu have to edit the script to change the copyright notice.
Document Dewarping is
a web service run by Ulges, Lampert, and Bruel. I have requested source code
with a thought toward incorporating it into Leptonica.
Piggy's page workflow
These are just some wrapper scripts for the real tools that I use to
generate images of books. At every step I go through the output
images to decide if I need to tweak the paramters on one of the
converters. If there are only a couple pages to tweak, I'll just pull
images from the previous stage into gimp and hand-manipulate the image
to match what I need.
originals -> pages -> deskewed -> reduced
\
-> mono565
- Use kooka, xsane or sane to capture raw images. I use 200ppi 8bit grayscale for most pages, and then 1200ppi 8bit grayscale for b&w illustrations. Color illustrations each get their own evaluation.
- mkpages: Split scans into pages.
- repng: Eliminate an alpha problem from ImageMagick? using some netpbm filters.
- mkdeskewed: Deskew a directory full of pngs, uses im_deskew and detect_deskew.
- repng: Remove an unnecessary alpha channel using some netpbm filters.
- I should use gifcrop, but I don't have it working on my laptop...
- threshdown: Reduce bit depth and resolution of a directory of pngs. 200ppiX8bit -> 150ppiX4bit for dejavu books
- I'll apply gifpage to the reduced images to create a book for reading online.
- repng: Actually switch to 4 bit representation using some netpbm filters.
- mkmono565: Convert a 200ppi 8bit gray image to a 565ppi mono image for OCR. If I need to tackle any pages by hand with the gimp, I use difference-of-Gaussians. For extreme cases I use Linear Adaptive Thresholding with a vertical range of about 3 lines and a horizontal range of about 1.5 characters.
- recrush: Compress a directory of pngs to their smallest representation with pngcrush.
- Descreen any halftoned images with pei halftone. This package has a restrictive non-commercial use only license.
- graco emelia car seat
Third party code
These are packages we've found useful as stand-alone tools, or just found.
A median filter for netpbm.
http://www.personal.psu.edu/staff/b/u/burns/source_code.html This is in
NetPBM? 10.30.
Geoff Horton's clean is a program for converting images to black and white dealing with some common problems encountered scanning pictures.
pngcrush is an excellent tool for making the smallest png's you can get.
PEI Halftone Descreening
- tesseractwrap: Wrapper for tesseract. It converts files to tiff and runs multiple files through tesseract.
to top