The solution? Scan it.
The problem? How to scan it.
Unlike printed, typed or even many handwritten documents it's not easy to pull apart a book and scan the pages with an automatic machine, especially when the book is old, out of print and quite valuable. Most book scanners (including Google's) use cameras instead. This is my setup:
A very high-tech setup.
It's all very simple; a camera, a tripod to hold the camera still, remote shutter button to snap the pictures, lots of lamps for even illumination and a data connection to the computer so I didn't fill up the memory card too fast.
With some magic semi-automated processing these images are all that is needed for a perfect scan. Using ImageJ I converted them to black and white, subtracted the background and cropped/rotated the pages. These are some samples:
These processed images can simply be fed into Adobe Acrobat or other similar optical character recognition (OCR) software to translate the image of the text into machine-understandable, fully-searchable text. Exactly what I need!
Software used:
ImageJ: Automated image processing
Adobe Acrobat: OCR
No comments:
Post a Comment
Note: only a member of this blog may post a comment.