UiPath Document Understanding Framework

Imagine a machine or a bot looking at the scanned document with the values scattered. In such a case, human brain is trained naturally to segregate the data from different scanned documents. In this case the machine requires the “Eyes” called the “Intelligent OCR” or the “OCR Engine” and the Brain called “Customizable Machine Learning” algorithms.

The UiPath Document Understanding Framework is designed to help users combine different approaches to extract information from multiple documents, not necessarily with the same structure.

You would need to install the below packages to start with.

Packages to be Installed:

1)Intelligent OCR activities

2)Omni Page OCR

3)Machine Learning Extractor

Implementation Steps:

Taxonomy

In this pre-processing step, you can add multiple document types and the fields you are interested in extracting. For example, you can work with Invoices, wanting to extract the vendor and the total amount, and with medical forms, wanting to extract insured ID number and patient name.

Using Taxonomy Manager, you can you can create your own Taxonomy.

Digitization

As the documents are processed one by one, they go through the digitization process. The difference for non-digital (scanned) documents is that you need to apply the OCR engine of your choice. The outputs of this step are the Document Object Model and a string variable containing all the document text and are passed down to the next steps.

Classification

After digitization, the document is classified. If you are working with multiple documents types in the same project, to extract data properly you need to know what type of document you’re working with. The important thing is that you can use multiple classifiers in the same scope, you can configure the classifiers and, later in the framework, train them. The classification results help in applying the right strategy in extraction.

Extraction

Extraction is getting just the data you are interested in. For example, extracting specific data from a 5-page document is quite troublesome if you want to do it with string manipulation. In this framework, you can use different extractors, for the different document structures, in the same scope application. The extraction results are passed further for validation.

Validation

The extracted data can be validated by a human user through the Validation Station. A best practice is to build logic around the decision of adding or not a human validation step, with rules depending on the specific use case to be implemented. Validation results can then be exported and used in further automation activities.

Export

Once you have your validated information, you can use it as it is, or save it in a DataTable format that can be converted very easy into an Excel file.

Training Classifiers and Extractors

Classification and Extraction are as efficient as the classifiers and extractors used are. If a document was not classified properly, it means it was unknown to the active classifiers. The same way goes for incorrect data extraction. The Framework provides the opportunity to train the classifiers and the extractors, to improve recognition of the documents and fields.

Robotic Process Automation

Test Automation

Oracle Cloud

UiPath

WorkSoft

SAP

Tricentis

Oracle

Blogs

Case Studies

Webinar

Videos

Robotic Process Automation

Test Automation

Oracle Cloud

UiPath

WorkSoft

SAP

Tricentis

Oracle

Blogs

Case Studies

Webinar

Videos

Packages to be Installed:

Implementation Steps:

Taxonomy

Digitization

Classification

Extraction

Validation

Export

Training Classifiers and Extractors

G S Suraj

Robotic Process Automation

Test Automation

Oracle Cloud

UiPath

WorkSoft

SAP

Tricentis

Oracle

Blogs

Case Studies

Webinar

Videos

Robotic Process Automation

Test Automation

Oracle Cloud

UiPath

WorkSoft

SAP

Tricentis

Oracle

Blogs

Case Studies

Webinar

Videos

Packages to be Installed:

Implementation Steps:

Taxonomy

Digitization

Classification

Extraction

Validation

Export

Training Classifiers and Extractors

G S Suraj

Newsletter Updates

Related Posts

Oracle Fusion Roadmap 2026: AI Agents, Automation, and What’s Next

What High-Performing Teams Get Right About UAT Automation

Bridging Testing and Automation with AI

How do I use AI Agent Studio

How to Restore a Deleted Process in Worksoft Certify?

A New Era for Performance Testing