Skip to topic | Skip to bottom
Home
Deskew
Deskew.WebHomer1.1 - 16 Jan 2007 - 10:19 - TWikiGuesttopic end

Start of topic | Skip to actions
Welcome to the home of durendal TWiki.Deskew. This is a web-based collaboration area for the development of software to assist in image manipulation of scanned books.

Literature

Segmentation

One of the first things we do with a raw set of scans is split them into separate pages, and possibly extract illustrations. Here are some papers describing some algorithms.

Skew

Our initial literature search led us to the Hough transform for detecting the skew angle. While this worked the Hough transform is very computationally intensive. The literature is full of mechanisms to reduce the number of points that must be fed to the Hough transform. The W. Postl method for detecting skew angle is much lighter.

Code based on the Leptonica library

We're starting by producing a program to deskew pages that are rotated during the scanning process.

After a careful search of the literature and the web we chose the leptonica library as the best starting point. This is the paper describing the technique used by the Leptonica library for skew detection. It implements a method from W. Postl's expired patent.

The first step is to develop a stand alone tool to do deskew. When we have a reliable stand alone tool we want to add this to the netpbm library of tools, ImageMagick and The Gimp.

This is an initial sketch, based on the deskewtest.c provided with the Leptonica library. It is usable if not quite robust. The underlying library is very robust, so it should be safe to use.

deskew.c: current deskew source (bilevel/grayscale/rgb)

We want a nice smooth rotate. The cubic interpolation rotate provided by gimp is very nice. Here is a script-fu-based program which uses Leptonica to get the skew angle and then gimp to do the rotate. It is very slow, but provides a nice result. We'd like to give similar functionality without the overhead of gimp, but meanwhile, here's a usable tool...

  • detect_skew.c: Simple Leptonica-based skew detector
  • deskew.c: a pure Leptonica skew remover
  • deskew_color.c: a pure Leptonica skew remover for RGBA images
  • im_deskew: a driver script to call detect_skew and "convert -rotate"
  • gimp_deskew: a driver script to call detect_skew and rotate
  • rotate: A gimp-based rotate script

You can see some experiments which compare the various methods. The pure Leptonica solutions are decidely the fastest, and produce images which are virtually indistinguishable from the very nice gimp rotate. The ImageMagik? (convert) deskewer has the nice property that it increases the image size to accomodate the whole rotated image. This is nice for very highly skewed pages which might otherwise suffer badly from the implicit cropping.

Planned Enhancements:

  • Control confidence threshold from command line.
  • Automatic black level detection for gray scale and color images. (IP Jul 2005 --piggy)
  • Manual black level setting.
  • Multiple input/output formats. (gif, png, tif, pnm, jpg etc.) We get most of these for free with Leptonica. --piggy
  • The best rotate algorithm we can find. The AM code in Leptonica is pretty good, but GIMP's quadratic interpolation gives better results for some of the pages I've tried.
  • Manually specify a margin (specify active area?) to ignore for the detection.
  • Select BAG/Hough transform for skew detection?
  • When doing the rotate the pix we are rotating into must be expanded on the borders so that we don't clip the corners.

Problematic pages

The first aberration we encountered is a number of pages that are trapezoidal rather than rectangular. There is no single skew angle for the page. We theorize that a skew detection on the top third of the page that differs from the bottom third of the page will detect these. We've not tried code yet for this.

A screwy page with 0 confidence. A hand-calculated rotation for this page is -4.8°. If you apply that, you can see that the bottom half of the page is fine, but the top portion has a peculiar keystoning-like distortion. Greg speculates that this is a scanner feed error.

A sample trapezoidal page scanned unbound with an ADF. This page has been verified as trapezoidal on the original printed page. It seems this is fairly common in books from this era. As of version 1.26, Leptonica can automatically detect and eliminate trapezoidal distortion on this page. Here is the corrected page.

A sample image where deskew shears off the lower left corner of text.

A page with spine distortion. The book was not completely flat so the left column of text is noticably compressed due to parallax. This page also amply demonstrates variation in the grey background. My current experiments are aimed at calculating the parallax distortion using the background grey information.

Code based on Yu & Jain

Before finding Leptonica, we started work on implementing Yu & Jain. This is a remarkably well-written paper.

  • bag.py: An implementation of Block Adjacency Graphs

  • hough.py: An implementation of the Hough transform

  • gpl.txt: A copy of the GPL license

Paper posted locally by permission of A.K. Jain and courtesy of eDoc.

Code for the DjVu? book format

DjVu? is a file format for sets of online images such as facsimile editions of books. There is an Open Source implementation available at Source Forge. This is more a set of utilities for assembling pieces. I found it necessary to write my own wrapper script for converting a directory full of png's into a DjVu? book.

This is far from a robust application, and probably doesn't produce the most compact book possible, but it is no worse than a tarball of the pngs.

  • djvudrv: DjVu? driver (Bourne shell) for building a book from a directory of png's

Weird misc. code pieces

pnmcrop.margin.patch: A patch to add margin capability to pnmcrop. For this to be really useful I need to do a despeckle before determining what the image border is.

pnmcrop.margin.filter.2.patch: margin and filter patch for pnmcrop with bug fixes to the margin code. This is in NetPBM? 10.30, but it's broken. frown

gifcrop: The perl script I use to drive the pnm tools to auto-crop text pages. Now works for grey scale images as well. The threshold value is hard coded though. Any one know how to auto threshold?

gifpage is a dumb perl script to wrap image files with html. Ypu have to edit the script to change the copyright notice.

Document Dewarping is a web service run by Ulges, Lampert, and Bruel. I have requested source code with a thought toward incorporating it into Leptonica.

Piggy's page workflow

These are just some wrapper scripts for the real tools that I use to generate images of books. At every step I go through the output images to decide if I need to tweak the paramters on one of the converters. If there are only a couple pages to tweak, I'll just pull images from the previous stage into gimp and hand-manipulate the image to match what I need.

   originals -> pages -> deskewed -> reduced
                                  \
                                    -> mono565

  • Use kooka, xsane or sane to capture raw images. I use 200ppi 8bit grayscale for most pages, and then 1200ppi 8bit grayscale for b&w illustrations. Color illustrations each get their own evaluation.
  • mkpages: Split scans into pages.
  • repng: Eliminate an alpha problem from ImageMagick? using some netpbm filters.
  • mkdeskewed: Deskew a directory full of pngs, uses im_deskew and detect_deskew.
  • repng: Remove an unnecessary alpha channel using some netpbm filters.
  • I should use gifcrop, but I don't have it working on my laptop...
  • threshdown: Reduce bit depth and resolution of a directory of pngs. 200ppiX8bit -> 150ppiX4bit for dejavu books
  • I'll apply gifpage to the reduced images to create a book for reading online.
  • repng: Actually switch to 4 bit representation using some netpbm filters.
  • mkmono565: Convert a 200ppi 8bit gray image to a 565ppi mono image for OCR. If I need to tackle any pages by hand with the gimp, I use difference-of-Gaussians. For extreme cases I use Linear Adaptive Thresholding with a vertical range of about 3 lines and a horizontal range of about 1.5 characters.
  • recrush: Compress a directory of pngs to their smallest representation with pngcrush.
  • Descreen any halftoned images with pei halftone. This package has a restrictive non-commercial use only license.

Third party code

These are packages we've found useful as stand-alone tools, or just found.

A median filter for netpbm. http://www.personal.psu.edu/staff/b/u/burns/source_code.html This is in NetPBM? 10.30.

Geoff Horton's clean is a program for converting images to black and white dealing with some common problems encountered scanning pictures.

pngcrush is an excellent tool for making the smallest png's you can get.

PEI Halftone Descreening

  • tesseractwrap: Wrapper for tesseract. It converts files to tiff and runs multiple files through tesseract.

to top

I Attachment sort Action Size Date Who Comment
deskew.c manage 6.2 K 11 Feb 2006 - 17:19 PiggyYarroll  
bsir025.png manage 38.2 K 26 Jun 2005 - 18:36 GregWeeks a trapezoidal page
bag.py.txt manage 8.6 K 28 Jun 2005 - 02:35 PiggyYarroll Implementation of Block Area Graphs
hough.py.txt manage 3.2 K 28 Jun 2005 - 02:36 PiggyYarroll An implementation of the Hough transform
gpl.txt manage 17.6 K 28 Jun 2005 - 11:49 GregWeeks A copy of the gpl license
djvudrv manage 1.6 K 01 Jul 2005 - 11:29 PiggyYarroll Add copyright notice
SkewDetection.pdf manage 716.1 K 01 Jul 2005 - 11:19 PiggyYarroll Yu & Jain
anss055.png manage 61.6 K 06 Jul 2005 - 00:56 GregWeeks deskew shears off lower left corner
cm0003.png manage 751.5 K 13 Jul 2005 - 12:39 PiggyYarroll Shows spine distortion and background variation
botm007.png manage 55.4 K 15 Jul 2005 - 00:58 GregWeeks A screwy page with 0 confidence.
botm007_rotated.png manage 362.1 K 15 Jul 2005 - 20:22 PiggyYarroll botm007.png rotated to eliminate skew showing remaining problems
bsir025_cleaned.png manage 23.2 K 27 Jul 2005 - 18:58 PiggyYarroll Trapezoidal distortion removed by Leptonica.
pnmcrop.margin.patch manage 1.6 K 30 Jul 2005 - 00:59 GregWeeks A patch to add margin capability to pnmcrop.
pnmcrop.margin.filter.patch manage 2.9 K 07 Aug 2005 - 23:16 GregWeeks margin and filter patch for pnmcrop
pnmcrop.margin.filter.2.patch manage 2.9 K 10 Aug 2005 - 01:45 GregWeeks margin and filter patch for pnmcrop
gifcrop manage 2.9 K 31 Dec 2005 - 00:33 GregWeeks The perl script I use to drive the pnm tools to auto-crop text pages.
detect_skew.c manage 5.7 K 30 Aug 2005 - 03:48 PiggyYarroll Simple Leptonica-based skew detector
rotate manage 0.8 K 30 Aug 2005 - 03:49 PiggyYarroll A gimp-based rotate script
gimp_deskew manage 0.1 K 30 Aug 2005 - 03:50 PiggyYarroll a driver script to call detect_skew and rotate
gifpage manage 2.6 K 10 Jan 2006 - 22:31 GregWeeks A dumb perl script to wrap image files with html.
mkpages manage 1.8 K 07 Feb 2006 - 03:16 PiggyYarroll Split scans into pages
repng manage 0.2 K 07 Feb 2006 - 03:16 PiggyYarroll Lose a (defective) alpha layer
mkdeskewed manage 0.3 K 07 Feb 2006 - 03:18 PiggyYarroll deskew a directory full of pngs
im_deskew manage 0.2 K 07 Feb 2006 - 03:19 PiggyYarroll Deskew a single image. Use ImageMagick? for the rotation.
threshdown manage 0.4 K 07 Feb 2006 - 03:21 PiggyYarroll Reduce bit depth and resolution of a directory of pngs.
.mkpages manage 0.7 K 07 Feb 2006 - 03:22 PiggyYarroll Default config file for mkpages
.threshdown manage 0.4 K 07 Feb 2006 - 03:22 PiggyYarroll Default config file for threshdown
.mkmono manage 0.1 K 07 Feb 2006 - 03:24 PiggyYarroll Default config file for mkmono*
mkmono565 manage 0.7 K 07 Feb 2006 - 03:25 PiggyYarroll Convert a 200ppi 8bit gray image to a 565ppi mono image for OCR.
deskew_color.c manage 6.4 K 11 Feb 2006 - 17:20 PiggyYarroll  
unpaper.zip manage 45.8 K 18 Mar 2006 - 16:23 PiggyYarroll Donovan's unpaper for that OTHER OS.
tesseractwrap manage 0.4 K 11 Nov 2006 - 23:11 PiggyYarroll Wrapper for tesseract. It converts files to tiff and runs multiple files through tesseract.

You are here: Deskew > WebHome

to top

Copyright © 1999-2008 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback