OCR and handwriting recognition,From Clipboard to Actionable Documents

June 8, 2018

|

AI Industry Insights

From Clipboard to Actionable Documents

No items found.

The state-of-the-art of extracting and unlocking unstructured data in documents

Much of the worlds information is held on paper and PDFs, or are simply scans of physical documents. Document Analysis and Recognition (DAR) is the term for the effort to use computers to crack open these static documents to make them more usable and useful.

Once unlocked and machine readable, there are a lot of things that can be done with documents using what’s called text mining or text analytics, including:

Auto-summarization or semantic summarization
Machine translation
Natural language understanding
Question answering
Relationship extraction
Text to speech
Syntax Analysis
Entity Recognition
Content Classification
Text visualization

Multiple companies offer text analytics as machine-learning-as-a-service microservices including:

For documents that are not machine readable — like those that are scanned as PDFs — optical character recognition (OCR) is the key means for text recognition and is the conversion of characters in a digital image to digital text. Although commercial OCR dates back to the 1950s and results can be very impressive, obtaining consistently high accuracy rates continues to be a challenging problem.

The best commercial OCR capabilities are available as machine-learning-as-a-service microservices including:

Recognizing handwritten text is an even more formidable task than OCR and the state-of-the-art is not very good. Handwriting Text Recognition (HTR) systems must handle overlapping characters, a mixture of cursive and non-cursive, and huge variations in writing styles. The task can be nearly impossible in some cases. Many of us have even had the strange experience of struggling to read our own handwriting.

Until recently, HTR recognition accuracy improved at a slow pace. Most gains were minimal and resulted from small tweaks to existing language model techniques, such as Hidden Markov Models (HMMs). The core algorithms remained fundamentally unchanged and recognition rates were low for even the best HTR systems.

Recent advances in machine learning, however, have revolutionized the field. In particular, the use of Convolutional Neural Networks (CNNs or ConvNets) and Long Short-Term Memory (LSTMs) networks have produced the most significant accuracy improvements in decades. These hybrid deep networks are more robust, handle a larger range of handwriting inputs, and constitute a fundamentally new approach to HTR.

LSTM networks are a type of Recurrent Neural Network (RNN) that can learn tasks requiring memories of events that happened thousands or even millions of discrete time steps earlier. This makes them ideal for HTR where letter and word orders are highly correlated.

‍Tesseract 4.0, an open source multilingual OCR/HTR engine maintained by Google, was re-architected in the summer of 2017 to use a hybrid CNN/LSTM deep neural network. The model was trained for several weeks on a corpus of 400,000 text lines spanning approximately 4,500 fonts. The reported accuracy gains are tremendous and the engine now supports over 100 languages.

Despite the impressive gains achieved with deep learning techniques, HTR continues to trail OCR in performance and accuracy. There are several key best practices one can follow, however, to help improve recognition results. These include

Image Resizing — Most systems work best on images that have a DPI of 300 or higher. Resizing smaller images can often dramatically improve recognition accuracy.
Binarization — Binarization is the process of converting color images to black and white. HTR systems don’t require color information, so most will automatically convert images before processing them. This procedure can produce suboptimal images, however, when the page background contrast varies too widely, so it’s important to make sure your images have a good separation of text from background.
Noise Removal — Random variation in image brightness or color (noise) can also reduce recognition accuracy. Most HTR systems attempt to denoise input images, but certain types of noise cannot be eliminated. To minimize noise levels, always use good illumination when scanning documents.
Deskewing — Documents that are not well aligned when scanned produce skewed output, with text flowing across the page at an angle instead of horizontally. This can severely affect line segmentation and reduce recognition accuracy.
Lexical Matching — Recognition accuracy can also be improved if the output is constrained by a lexicon — a list of words that are allowed to occur in a document. This is typically a dictionary of valid words in the language being processed. This simple technique can eliminate may common errors.
Field Specific Models — Field specific models use transfer learning, both fine tuning and head retraining, to extend an existing model by training it on additional data sets specific to the problem domain. By reducing the range of inputs each model must recognize, field specific models often have better performance and higher accuracy.

For the times when computers can’t accurately assess either text or handwritten data, have low confidence on their findings, or run across situations with exceptions, the fallback is to create a human-in-the-loop workflow to properly identify what was written. In other words, a person is asked to read what something says and type the answer. With this approach, an overall workflow can be very accurate, even if the OCR and HTR can’t handle certain situations. Top vendors of these human-in-the-loop workflow services include Alegionand Figure Eight.

Finally, for those interested in digging in deeper into these areas, there are several important technical conferences on Document Analysis and Recognition held annually:

International Workshop On Document Analysis Systems (DAS 2018), which was held in April in Vienna, Austria
Summer School on Document Analysis (SSDA 2018), July 2–6, 2018 in La Rochelle, France
International Conference on Frontiers in Handwriting Recognition (ICFHR 2018), August 5–8, 2018 in Niagara Falls, NY
International Conference on Document Analysis and Recognition (ICDAR 2019), September 22–25, 2019 in Brisbane, Australia
Text Analytics Forum, November 7–8, 2018 in Washington, DC

New deep learning techniques have revolutionized the field of document and text analysis and are contributing to dramatic improvements in the state-of-the-art. Unlocking insights from unstructured data captured in static documents has broad applications with new use cases popping up all the time. Unfathomable amounts of data and insights are currently hidden in billions of physical and PDF documents. Imagine the intelligence and informed actions your business could unlock with these new technologies.

A Gold Medal Moment for AI

Part 3: How to Choose an AI Governance Model That Works for Your Organization

The Real Breakthrough Behind DeepSeek R1

Anthropic Cracks Open the Black Box of AI

Predicting Cancer Before It Starts: An AI Milestone in Women’s Health

Reinforcement Learning: AI’s Next Big Leap

Copyright, Fair Use, and the Fight Over AI Training Data

The Real Illusion in Apple’s “Illusion of Thinking” Paper

Part 2: Designing AI Governance That Works

Part 1: Why AI Governance is a Strategic Imperative

Most People Don't Expect AI to Benefit Them. What Can We Do About That?

From Brain to Machine: How Neuroscience Is Shaping the Future of AI

KUNGFU.AI Partners with NACD to Equip Boards for the Age of AI

What Does “Productivity” Mean in an AI-Enabled World?

The Emergence of Product Analytics: An Under-appreciated Yet Critical Part of AI Development

The Academic in Industry: A Cultural and Pragmatic Shift

AI & Authenticity—What Does It Mean to Be "Real" in 2025?

AI is Like a Road Trip: Why You Need a Flexible Strategy, Not Just a Destination

Why Most AI Implementations Fail—And How to Get It Right

Reclaiming Attention in the Age of AI

Are Agents the Future?

Tired of the Hype? Let’s Baseline 10 Commonly Misused AI Terms

KUNGFU.AI’s AI Hiring Survival Guide

Part 3: How to Procure AI Services Through an RFP Process

Data Science: Bridging the Gap Between Business and Analytics

From Consumerism to Sustainability: AI’s Role in Shaping the Future of Economic Growth

Part 2: Planning for Next Year’s AI Budget: A Strategic Guide for C-Level Executives

Part 1: Building vs. Buying an AI Team: What’s Best for Your Business?

Mash-Up: AI and Potatoes USA Join Forces Against Misinformation

KUNGFU.AI Updates Ethical Pledge on Facial Recognition

3 Steps to Designing AI That Fits Like a Glove

LLMs are Engines. It’s Time for Vehicles.

Product Sense: A Hidden Lynchpin in Data Science and AI

Not Budgeting for AI Today is like Having Bet on the Slide Rule, Calculator or Fax

The Top AI Events We’re Looking Forward to in 2024

2024 Will Be The Year of The AI Budget

Engineering Explained: GPT-4V(ision)

KUNGFU.AI and CDAO Collaborate on AI Strategy for Defense Enterprise Ecosystem

Engineering Explained: Opportunity Sizing and ROI Analysis

Engineering Explained: Bayesian Mechanics

Celebrating Our Success: We Made the Inc. 5000 List of Fastest-Growing Private Companies in America!

10 Things Companies Should Think About When Devising an AI Strategy

Engineering Explained: Large Language Models

Engineering Explained: Diffusion Models

Understanding Data Science and Related Sub Sciences

KUNGFU.AI Joins Tradewinds’ Marketplace, Empowering Businesses with Cutting-Edge AI Services

How to Navigate the AI Industry: Join our Career Workshops

Innovation in the Age of Regulation: Building AI with Federated Learning

AI is the Future. ChatGPT is the assistant.

KUNGFU.AI’s Approach to Developing an ‘AI Center of Excellence’

KUNGFU.AI Joins INSA to Expand Government Partnerships and Reach

Data-Driven Decision-Making: Making Confident and Proactive Business Decisions

Navigating the Ethical Implications of Data Interpretation

Overcoming Cognitive Bias in Data Analysis and Decision-Making

ConvNeXt: A Transformer-Inspired CNN Architecture

How to Build a Great AI Engineering Team

Engineering Explained: LayoutLMv3 and the Future of Document AI

Turning Away Our First Client

AI Simplified: An Introduction to Artificial Intelligence

Introducing KUNGFU.AI Lab Days

Large Language Models: Three Stages of Adoption

The Future of AI: Can Open-Source Community Keep Up with Large Corporations?

How to Use ChatGPT: Our Step by Step Guide

What is ChatGPT? Everything You Need to Know.

Savimbo and KUNGFU.AI Partner to Bring AI to Rainforest Conservation

Data, Security, and Ethical Risks of AI Use in Healthcare

Engineering Explained: OpenAI's ChatGPT

4 Ways to Mitigate Bias and Prioritize Patients

We Used ChatGPT to Figure Out How Businesses Can Use ChatGPT

Want to WFH? Check Out These 10 Flexible Remote Companies

Where We Are and What's Coming

Meet the Team: Benjamin Klein

The First Mile of Any AI Project is Most Critical

Edge Computing for Business: What You Should Know

What You Should Know Before Investing in Computer Vision

KUNGFU.AI Presents: Using Computer Vision to Solve Business Challenges with WM

KUNGFU.AI Presents: Unlocking Greater Business Intelligence with Graphs

How Multitask Learning in Computer Vision Can Solve Your Business Challenges

Now Is the Time to Invest in Computer Vision and Secure a Competitive Advantage

Designing Your First NLP Annotation Job

Autism Acceptance Day

KUNGFU.AI Announces Chief Growth Officer and Record Growth

5 Ways to Realize ROI on AI investments

Join Us for Giving Tuesday

KUNGFU.AI Achieves Machine Learning Partner Specialization in the Google Cloud Partner Advantage Program

KUNGFU.AI Presents: The Obstacles in Building Product AI and How to Overcome Them

KUNGFU.AI Presents: The AI Ethical Imperative

Want to win with AI? Focus on your leadership, not the competition.

KUNGFU.AI Partners with Parasanti to Support U.S. Navy Foreign Object Detection Project

KUNGFU.AI and makepath Partner to Demonstrate Power of Machine Learning and Data Visualization

Deadline 2024: Why you only have 3 years left to adopt AI

How to Determine if AI can Solve Your Business Problem

Infographic: 10 Artificial Intelligence Trends To Watch Out For In 2021

Building Internal AI Capabilities: How to incorporate AI Ops into your organization

Building Internal AI Capabilities: Bridge the gap between data science and DevOps

Building Internal AI Capabilities: How to execute A.I. at scale

Building Internal AI Capabilities: How to ensure you have the right infrastructure & expertise

Building Internal AI Capabilities: What to think about when hiring a Chief AI Officer

Building Internal AI Capabilities: How company culture impacts adoption

You have 3 years left to get AI right. Our webinar can help.

Related resources

AI Industry Insights

Part 3: How to Procure AI Services Through an RFP Process

basket of potatoes

AI Industry Insights

Mash-Up: AI and Potatoes USA Join Forces Against Misinformation

women baking in boxing gloves

AI Industry Insights

3 Steps to Designing AI That Fits Like a Glove