02 • Grooper 2023 → Grooper A.C.E. Training • 2023 → Consultant • 2023

Data Extraction 101 • 2023 (2023.02.20)


Description
Data extraction is a critical component of document processing in Grooper. If nothing else, there's some data set you want to collect from your document. For each field present on the document you want, you'll need to create and configure a data extractor. But there is so much more you can do with extractors! Any time you need to use text on the document for some Grooper activity, you're going to need an extractor. Data extractors can be used to:
• Collect field data from documents
• Classify documents
• Separate documents
• Redact text on a document
• And much more (There are over 100 different Grooper objects you could conceivably configure an extractor!!)

Extractors are so important in Grooper because they simulate how you, a human, reads and understands a document. How do you know you're looking at an invoice as opposed to a HR benefits enrollment form? By reading it, looking for patterns and words that are common to one and not the other. How do you find the invoice number on that invoice? By reading it, looking for labels next to something that reads like an invoice number. Extractors work much the same way to much the same ends. They are a tool that automates the logic to read and understand documents that is so intuitive to a human reader.

That said, there's not just one way a human reads and understands a document. Rather, there are multiple ways people do that, understanding patterns in text data, using context clues provided by certain words, and even just analyzing where things are physically on the page. There's not just a single one-size-fits-all extractor either. Rather, there are multiple tools in Grooper's data extraction toolkit, each one with its own set of configurations and internal logic, designed to best target certain ways data is organized and presented on a document.

This course aims to educate you on the plethora of data extraction tools in Grooper. We will detail the extractor objects and extractor types available and how to configure them. We will also touch briefly on how these are used in the real world of document processing. This course is a critical prerequisite for all other Consultant courses, as each of these tools will pop up throughout later coursework.

BONUS!!! Because data extraction is so foundational to most of Grooper's document processing, you will be able to take a second course, Data Extraction Techniques FOR FREE after completing this course. As long as you complete Data Extraction 101, you will be able to enroll in Data Extraction Techniques at no additional cost.

Content
  • Files To Download
  • Introduction
  • PSA: UI Difference in These Videos
  • Extraction Fundamentals
  • Data Context
  • Text Parsing
  • Data Instancing
  • Data Chaining
  • Basic Extraction Tools
  • Properties vs Objects
  • Extractor Objects
  • Extractor Types
  • Intro To Extractor Types
  • Extractor Types: Pattern Match
  • Intro To Pattern Match
  • Pattern Match Basics I
  • Pattern Match Basics II
  • Pattern Match Basics III
  • Practical Example - Basic Data Extraction
  • Extractor Types: List Match
  • Intro To List Match
  • Lexicon Matching
  • Lexicon Translation
  • Practical Example - Document Separation
  • Extractor Types: Word Match
  • Intro To Word Match
  • Word Lookup
  • N-Grams
  • Word Match and Classification
  • Extractor Types: Labeled Value
  • Intro To Labeled Value
  • Labeled Value Basics
  • Extraction Zone - The Maximum Distance Property
  • How "Noise" Impacts Labeled Value
  • Noise Examples
  • How Line Boundaries Impact Labeled Value
  • Labeled Value Logic Execution Order
  • Extractor Types: Read Zone
  • Read Zone - Fixed Region
  • Read Zone - Relative Region
  • The Auto Snap Property
  • Read Zone - Text Region
  • Read Zone - Shape Region
  • Reprocessing Zones
  • Extractor Types: Other "Zonal" Extractors
  • Detect Signature
  • Highlight Zone
  • Extractor Types: Labeled OMR
  • Intro To OMR
  • Labeled OMR - Check One
  • Labeled OMR - Check Multi
  • Labeled OMR - Boolean
  • Label Groups
  • The Header Extractor Property
  • How Noise Impacts Labeled OMR
  • Radio Buttons
  • The List Values Shortcut
  • Extractor Types: Ordered OMR
  • Intro To Ordered OMR
  • Configuring Ordered OMR
  • Extractor Types: Zonal OMR
  • Zonal OMR
  • Extractor Types: Read Barcode & Find Barcode
  • Read Barcode
  • Find Barcode
  • Barcodes And Classification
  • Barcodes And Lookups
  • Data Types
  • Intro To Data Types
  • Local Extractor Configuration
  • Child Extractors
  • Referenced Extractors
  • Collation: Arrays & Ordered Arrays
  • Intro To Array Collation
  • Flow Layout
  • Example - Address Array
  • Arrays Vs Ordered Arrays
  • Example - Address Ordered Array
  • Data Type Execution Order
  • Ordered Arrays And Table Extraction
  • Collation: Key-Value Pairs & Key-Value Lists
  • Intro To Key-Value Pair
  • Key-Value Pair Configuration
  • Intro To Key-Value Lists
  • Key-Value Lists And Table Extraction
  • Collation: Combine
  • Intro To Combine Collation
  • Combine Methods - Sum
  • Combine Methods - Flow
  • Combine Methods - Geometric
  • Practical Example - Flow Combine
  • Practical Example - Geometric Combine
  • Collation: Split
  • Split - Begin
  • Split - End
  • Split - Between
  • Split - Around
  • Practical Example - Split Around
  • Instancing And Whitespace Considerations
  • Practical Example - Split & Data Sections
  • Collation: Pattern-Based
  • Intro To Pattern-Based Collation
  • Pattern-Based Configuration
  • Resource Reusability - Table Extraction
  • Resource Reusability - Sections
  • Efficiency
  • Collation: AND
  • Intro To AND Collation
  • AND Collation & Classification
  • Practical Example - AND Classification
  • Collation: Multi-Column
  • Intro To Multi-Column Collation
  • Practical Example - Multi-Column Section Extraction
  • Field Classes
  • Intro To Field Classes
  • Zonal Context
  • Flow Context
  • Self Context
Completion rules
  • All units must be completed
  • Leads to a certificate with a duration: 2 years