March 7, 2023

|

AI Industry Insights

Engineering Explained: LayoutLMv3 and the Future of Document AI

Edward Cates

,

Staff Machine Learning Engineer

Image-to-text and Document AI models have gotten much better in the past few years. The techniques that dominated industry for a long time (template matching and text-based rules) have been overpowered first by language models and now by new document models that consider not only text but visual properties of the document as well.

Visual indicators possess key context for understanding documents - what a document looks like, where certain pieces of text are on the page (bottom versus top, small versus large). The same type of document (e.g. invoice) can have a huge variety of formats, requiring a more contextually-enriched understanding of the document. Prior state-of-the-art approaches (pre-2022) perform OCR (text extraction on documents) and plug the resulting text into a language model, discarding any visual information.

LayoutLMv3 and Donut (OCR-Free Document Understanding Transformer) are two new models (released second half of 2022) that attain higher levels of document understanding by considering not just document text but the visual features of the document. LayoutLMv3 uses OCR data (text on the page including its position and size) plus a low-resolution picture of the document, from which it’s able to learn different layouts.

Donut is OCR-Free, meaning that the only input is a picture of the document itself (a document being a single page). With higher resolution (~1920p), the Donut model is able to perform text extraction from the document internally, and the extracted text is enriched with all the visual features of the document.

Either of these groundbreaking models can be trained to perform almost any document task you desire. Training both to perform the same task and using an ensembling approach (ask them both the same question as see if they agree) can make them even more powerful.

‍

AI via Fierce Humanism - Building Better than Good Enough

Gartner® Research Identifies Shift Toward AI-Native Team Models; Cites KUNGFU.AI

Leadership in the Age of AI

Designing Organizations for AI-Driven Decision Making

Gartner® Identifies Fundamental Shift in AI Services Market; Cites KUNGFU.AI Among Emerging AI-Native Providers

The Super-Weight Phenomenon: What Hidden Parameters Reveal About Large Language Models

Does AI Coding Assistance Actually Improve Productivity?

2026: The Year AI Grows Up

How We Use AI to Engineer AI

Guiding America’s Boardrooms into the Age of AI

AI Leaders Summit: Exclusive One-on-one's with AI Experts

Don’t Poison Your Own Well with GenAI, Use it to Dig Deeper

You Made It to Production: Now What?

Rethinking the AI Development Lifecycle

Why 90% of AI Projects Fail Before They Launch

A Gold Medal Moment for AI

Part 3: How to Choose an AI Governance Model That Works for Your Organization

The Real Breakthrough Behind DeepSeek R1

Anthropic Cracks Open the Black Box of AI

Predicting Cancer Before It Starts: An AI Milestone in Women’s Health

Reinforcement Learning: AI’s Next Big Leap

Copyright, Fair Use, and the Fight Over AI Training Data

The Real Illusion in Apple’s “Illusion of Thinking” Paper

Part 2: Designing AI Governance That Works

Part 1: Why AI Governance is a Strategic Imperative

Most People Don't Expect AI to Benefit Them. What Can We Do About That?

From Brain to Machine: How Neuroscience Is Shaping the Future of AI

KUNGFU.AI Partners with NACD to Equip Boards for the Age of AI

What Does “Productivity” Mean in an AI-Enabled World?

The Emergence of Product Analytics: An Under-appreciated Yet Critical Part of AI Development

The Academic in Industry: A Cultural and Pragmatic Shift

AI & Authenticity—What Does It Mean to Be "Real" in 2025?

AI is Like a Road Trip: Why You Need a Flexible Strategy, Not Just a Destination

Why Most AI Implementations Fail—And How to Get It Right

Reclaiming Attention in the Age of AI

Are Agents the Future?

Tired of the Hype? Let’s Baseline 10 Commonly Misused AI Terms

KUNGFU.AI’s AI Hiring Survival Guide

Part 3: How to Procure AI Services Through an RFP Process

Data Science: Bridging the Gap Between Business and Analytics

Part 2: Planning for Next Year’s AI Budget: A Strategic Guide for C-Level Executives

Part 1: Building vs. Buying an AI Team: What’s Best for Your Business?

Mash-Up: AI and Potatoes USA Join Forces Against Misinformation

KUNGFU.AI Updates Ethical Pledge on Facial Recognition

3 Steps to Designing AI That Fits Like a Glove

LLMs are Engines. It’s Time for Vehicles.

Product Sense: A Hidden Lynchpin in Data Science and AI

Not Budgeting for AI Today is like Having Bet on the Slide Rule, Calculator or Fax

The Top AI Events We’re Looking Forward to in 2024

2024 Will Be The Year of The AI Budget

Engineering Explained: GPT-4V(ision)

KUNGFU.AI and CDAO Collaborate on AI Strategy for Defense Enterprise Ecosystem

Engineering Explained: Opportunity Sizing and ROI Analysis

Engineering Explained: Bayesian Mechanics

Celebrating Our Success: We Made the Inc. 5000 List of Fastest-Growing Private Companies in America!

10 Things Companies Should Think About When Devising an AI Strategy

Engineering Explained: Large Language Models

Engineering Explained: Diffusion Models

Understanding Data Science and Related Sub Sciences

KUNGFU.AI Joins Tradewinds’ Marketplace, Empowering Businesses with Cutting-Edge AI Services

How to Navigate the AI Industry: Join our Career Workshops

Innovation in the Age of Regulation: Building AI with Federated Learning

AI is the Future. ChatGPT is the assistant.

KUNGFU.AI’s Approach to Developing an ‘AI Center of Excellence’

KUNGFU.AI Joins INSA to Expand Government Partnerships and Reach

Data-Driven Decision-Making: Making Confident and Proactive Business Decisions

Navigating the Ethical Implications of Data Interpretation

Overcoming Cognitive Bias in Data Analysis and Decision-Making

ConvNeXt: A Transformer-Inspired CNN Architecture

How to Build a Great AI Engineering Team

Engineering Explained: LayoutLMv3 and the Future of Document AI

Turning Away Our First Client

AI Simplified: An Introduction to Artificial Intelligence

Introducing KUNGFU.AI Lab Days

Large Language Models: Three Stages of Adoption

The Future of AI: Can Open-Source Community Keep Up with Large Corporations?

How to Use ChatGPT: Our Step by Step Guide

What is ChatGPT? Everything You Need to Know.

Savimbo and KUNGFU.AI Partner to Bring AI to Rainforest Conservation

Data, Security, and Ethical Risks of AI Use in Healthcare

Engineering Explained: OpenAI's ChatGPT

4 Ways to Mitigate Bias and Prioritize Patients

We Used ChatGPT to Figure Out How Businesses Can Use ChatGPT

Want to WFH? Check Out These 10 Flexible Remote Companies

Where We Are and What's Coming

Meet the Team: Benjamin Klein

The First Mile of Any AI Project is Most Critical

Edge Computing for Business: What You Should Know

What You Should Know Before Investing in Computer Vision

KUNGFU.AI Presents: Using Computer Vision to Solve Business Challenges with WM

KUNGFU.AI Presents: Unlocking Greater Business Intelligence with Graphs

How Multitask Learning in Computer Vision Can Solve Your Business Challenges

Now Is the Time to Invest in Computer Vision and Secure a Competitive Advantage

Designing Your First NLP Annotation Job

Autism Acceptance Day

5 Ways to Realize ROI on AI investments

Join Us for Giving Tuesday

KUNGFU.AI Achieves Machine Learning Partner Specialization in the Google Cloud Partner Advantage Program

KUNGFU.AI Presents: The Obstacles in Building Product AI and How to Overcome Them

KUNGFU.AI Presents: The AI Ethical Imperative

Related resources

AI Industry Insights

AI via Fierce Humanism - Building Better than Good Enough

AI Industry Insights

Leadership in the Age of AI

AI Industry Insights

Designing Organizations for AI-Driven Decision Making