Introduction

Separation And Classification In Grooper

In Grooper, the fundamental unit of document processing is the Batch. This is the container holding the digital representations of your documents. A Batch is comprised of Batch Folder and Batch Page objects. However, separated and classified documents do not spring into the Batch fully formed. Separation and classification are steps in list of processing instructions given to items in the Batch: the Batch Process.

Separation

Separation is the process of creating folders for pages in a Batch. This process is automated by the Separate activity in a Batch Process. The Separate activity determines where Batch Folders should be inserted in a Batch, with Batch Pages being placed in those folders. These "separation points" or "binding points" are determined by a Separation Provider, using information such as words on a page, page numbers, visual markers or other information to determine on what page a document starts and stops.

Classification

Once you have a Batch Folder with page content, that's a document! But you don't know what type of document it is until you classify it. Classification in Grooper is the process of assigning a Batch Folder a Document Type. Document Types are added to a Content Model to distinguish the content on document "A" from the content on document "B". The Classify activity assigns Batch Folders a Document Type based on a Content Model's Classification Method configuration, using training based machine learning and more "rules based" extraction techniques.

In this video, we briefly introduce these objects and concepts in Grooper before digging much deeper in subsequent lessons.

FYI

There are some techniques in Grooper where you can separate and classify documents with a single Activity, notably ESP Auto Separation (Don't worry. We'll cover this later.).  This can be confusing for new Grooper learners.  Keep in mind, separation and classification are still functionally two different things:

  1. Separation is forming Batch Folders from loose Batch Pages.
  2. Classification is a the assignment of a Document Type to a Batch Folder.