03 • Grooper 2021 → Grooper A.C.E. Training • 2021 → Consultant • 2021

Data Extraction for Structured Documents • 2021 (B2021.02.26)


Description
Data on structured documents generally exists in a predictable format from one document to the next. While information may change from document to document, presentation and labelling of that information is generally consistent. By no means does that mean extracting the data from them is always simple. Poor form design, differences in format, inconsistencies in data formatting, and other idiosyncrasies and oddities provide challenges to extracting data from structured and semi-structured documents.

This course aims to educate users on different methods to configure data extraction for structured and semi-structured documents. This course will focus heavily on data modeling of document sets, using extractor techniques to target, collate, and populate results.
Content
  • Introduction
  • Files to Download
  • Structured vs Unstructured Documents sample
  • Data Extraction for Structured Documents - Quiz 1
  • Data Context
  • Data Context sample
  • Data Context in Grooper
  • Data Extraction for Structured Documents - Quiz 2
  • Collation Methods
  • Key-Value Pair: The Basics
  • Key-Value Pair: Additional Considerations
  • Key-Value List
  • Arrays And Ordered Arrays
  • Combine
  • Split
  • The Rest Of Them
  • Data Extraction for Structured Documents - Quiz 3
  • Data Modeling
  • Intro To A Data Model & Data Elements
  • Extraction Logic
  • Data Extraction for Structured Documents - Quiz 4
  • Generic Extractors
  • Date Extractor: Building Your First Extractor
  • Time Extractor: Value Variability
  • OMR Time: Intro to OMR Extraction
  • Generic Text Segment: The Most Generic Extractor
  • Label Segments And Value Segments: From Generic to Semi-Generic
  • Data Extraction for Structured Documents - Quiz 5
  • Field Extraction
  • Report Number: Intro to the Labeled Value Extractor
  • Crash Date: Exercise
  • City: Using Segment Extractors and Leveraging Output Groups
  • County: Intro to Default Values and Data Element Overrides
  • State: Intro to Expression Based Extraction
  • Crash Type: Exercise (And More...)
  • The Waterfall Technique: Sorting Extraction Results
  • Data Extraction for Structured Documents - Quiz 6
  • Data Sections
  • Intro to Data Sections & The Divider Method
  • Report Totals Section: Exercise
  • Multi-Instance Sections & The Simple Method
  • Other Data Section Extraction Methods
  • Data Extraction for Structured Documents - Quiz 7
  • Practical Example: The Party Info Section
  • Getting Started: Building a Section Extractor for a Single Document
  • Fleshing it Out: A Section Extractor for the Whole Document Set
  • Multi-Instance Data Field Extraction
  • Party Info: Last Name and First Name
  • Data Sections as an Organizational Tool & The VIN Data Field
  • Make and Model: Exercise
  • Plate Number and Plate State: Resolving Ambiguous Labels
  • Data Extraction for Structured Documents - Quiz 8
  • Finale
  • Finale
  • Data Extraction For Structured Documents - Course Survey
  • Data Extraction for Structured Documents - Exam Assessment
  • Data Extraction for Structured Documents - Lab Assessment
Completion rules
  • All units must be completed