UiPath Document Understanding Framework

UiPath Document Understanding Framework

Imagine a machine or a bot looking at the scanned document with the values scattered. In such a case, human brain is trained naturally to segregate the data from different scanned documents. In this case the machine requires the “Eyes” called the “Intelligent OCR” or the “OCR Engine” and the Brain called “Customizable Machine Learning” algorithms.

The UiPath Document Understanding Framework is designed to help users combine different approaches to extract information from multiple documents, not necessarily with the same structure.

You would need to install the below packages to start with.

Packages to be Installed:

1)Intelligent OCR activities

2)Omni Page OCR

3)Machine Learning Extractor

Implementation Steps:


In this pre-processing step, you can add multiple document types and the fields you are interested in extracting.  For example, you can work with Invoices, wanting to extract the vendor and the total amount, and with medical forms, wanting to extract insured ID number and patient name.


Using Taxonomy Manager, you can you can create your own Taxonomy.

Taxonomy Manager


As the documents are processed one by one, they go through the digitization process. The difference for non-digital (scanned) documents is that you need to apply the OCR engine of your choice. The outputs of this step are the Document Object Model and a string variable containing all the document text and are passed down to the next steps.

Digitize document


After digitization, the document is classified. If you are working with multiple documents types in the same project, to extract data properly you need to know what type of document you’re working with. The important thing is that you can use multiple classifiers in the same scope, you can configure the classifiers and, later in the framework, train them. The classification results help in applying the right strategy in extraction.

Classify document scope


Extraction is getting just the data you are interested in. For example, extracting specific data from a 5-page document is quite troublesome if you want to do it with string manipulation. In this framework, you can use different extractors, for the different document structures, in the same scope application. The extraction results are passed further for validation.

Data extraction scope


The extracted data can be validated by a human user through the Validation Station. A best practice is to build logic around the decision of adding or not a human validation step, with rules depending on the specific use case to be implemented. Validation results can then be exported and used in further automation activities.

Present validation station


Once you have your validated information, you can use it as it is, or save it in a DataTable format that can be converted very easy into an Excel file.

Export extraction results

Training Classifiers and Extractors

Classification and Extraction are as efficient as the classifiers and extractors used are. If a document was not classified properly, it means it was unknown to the active classifiers. The same way goes for incorrect data extraction. The Framework provides the opportunity to train the classifiers and the extractors, to improve recognition of the documents and fields.

Leave a Reply

SOAIS - Worksoft Newsletter

To view on your browser, click here
Facebook Twitter LinkedIn
Dear Default Value,

Welcome to SOAIS Newsletter of September 2021!

Continuous Testing with Remote Execution
The speed of innovation continues to increase, driving rapid and relentless change for today’s ever-evolving IT landscapes, creating greater risk as IT and business teams scramble to ensure timely delivery. How can your organization keep pace? Test more, worry less. With Worksoft’s Connective Automation Platform, you can easily build and maintain automated tests, accelerating testing time without losing scope or volume. You can schedule and execute remote, continuous tests to intercept defects sooner and prioritize remediation - without sacrificing your nights and weekends. Explore how continuous test automation and remote execution can empower your organization.

Click here to connect with us to get more information on our services.

Skip Costly Rework with Dynamic Change Resiliency​

Change resiliency is imperative in ever-evolving IT environments. Our patented object action framework streamlines change management by assigning object definitions to your shared assets. The same object may be used in a thousand automation steps, but it can be easily updated by making one simple change to the model definition. The change automatically propagates to every single instance where that object may have been used without a single line of code or manual human involvement. For more change readiness you can also engage our Impact Analysis for SAP to predict how changes in SAP transports will affect your business processes. 

Please click here to watch the video to get a gist.

SOAIS Blog – Nuts and bolts of Certify Database Maintenance​

One of the key thing, which is often missed by the organizations, who have invested in using Worksoft Certify for automating their Business Process Validation initiatives, is implementing a Database Maintenance Plan. While the business and the test automation consultants get excited about the shiny new thing that they have got and start building the regression suite; planning and executing a database maintenance plan for most of the customers gets pushed down the priority list. However, since all the test assets in Certify are stored in a Database, a robust database maintenance plan is very important to maintain smooth operation of Certify with acceptable performance criteria. The customers usually start facing issues once they have built significant number of Certify processes which they have started executing on regular basis. Such executions add a lot of data to the tables storing results data and increase the overall size of the Certify database.

Please click here to read the complete blog.

Worksoft Blog – Process Intelligence: A Multi-Dimensional Approach

The ability to extract process knowledge has become easier through the years. Technology has evolved to the point where we can deploy capabilities that connect at multiple levels to extract different types of process insight. In the past, organizations were forced to spend enormous energy extracting data manually from different applications and databases. Then, they would have to use things like spreadsheets to transform the data and convert it into meaningful information. 

Please click here and read the complete blog.
Unit 9, Level 5, Navigator, ITPL,
Bangalore - 560 066.
Phone: +91 80 40071234
Suite 101, 1979, N Mill St,
Naperville, IL 60563
Phone 1-800-262-2427
Please click here to Unsubscribe / Unsubscribe Preferences

Leave us your info