Book scanning or book digitization (also: magazine scanning or magazine digitization) is the process of converting physical books and magazines into digital media such as images, electronic text, or electronic books (e-books) by using an image scanner. Large scale book scanning projects have made many books available online.
Digital books can be easily distributed, reproduced, and read on-screen. Common file formats are DjVu, Portable Document Format (PDF), and Tag Image File Format (TIFF). To convert the raw images optical character recognition (OCR) is used to turn book pages into a digital text format like ASCII or other similar format, which reduces the file size and allows the text to be reformatted, searched, or processed by other applications.
Image scanners may be manual or automated. In an ordinary commercial image scanner, the book is placed on a flat glass plate (or platen), and a light and optical array moves across the book underneath the glass. In manual book scanners, the glass plate extends to the edge of the scanner, making it easier to line up the book's spine.
A problem with scanning bound books is that when a book that is not very thin is laid flat, the part of the page close to the spine (the gutter) is significantly curved, distorting the text in that part of the scan. One solution is to separate the book into separate pages by cutting or unbinding. A non-destructive method is to hold the book in a V-shaped holder and photograph it, rather than lay it flat and scan it. The curvature in the gutter is much less pronounced this way. Pages may be turned by hand or by automated paper transport devices. Transparent plastic or glass sheets are usually pressed against the page to flatten it.
After scanning, software adjusts the document images by lining it up, cropping it, picture-editing it, and converting it to text and final e-book form. Human proofreaders usually check the output for errors.
Scanning resolution for book digitization varies depending on the purpose and nature of the material. While 300 dpi (118 dots/centimeter) is generally adequate for text conversion, archival institutions recommend higher resolutions for preservation and rare materials. The National Archives of Australia suggests 400 ppi for bound books and 600 ppi for rare or significant documents, while the Federal Agencies Digitization Guidelines Initiative (FADGI) recommends a minimum of 400 ppi for archival materials.
These higher resolutions ensure the capture of fine details and support long-term preservation efforts, while a tiered approach balances quality with practical constraints such as storage capacity and resource limitations. This strategy allows institutions to optimize digitization efforts, applying higher resolutions selectively to rare or significant materials while using standard resolutions for more common documents.
High-end scanners capable of thousands of pages per hour can cost thousands of dollars, but do-it-yourself (DIY), manual book scanners capable of 1,200 pages per hour have been built for US$300.