01 • Grooper 2023 → Grooper A.C.E. Training • 2023 → Consultant • 2023

Data Extraction Techniques • 2023 (2023.02.21)


Description
ATTENTION!! This is a companion course to Data Extraction 101. Once you've completed Data Extraction 101, this course will be available to you for free! No credits will be consumed upon enrolling in this course. If this course is listed as "Restricted", please complete the Data Extraction 101 course first.

This course covers additional topics related to data extraction in Grooper. Where Data Extraction 101 was a broad overview of extraction tools in Grooper, this delves into techniques you'll want to be aware of when crafting extraction logic. This will range from data extraction tips and tricks to details on properties and functionality we could not get to in the Data Extraction 101 course.

Content
  • Data Extraction Techniques
  • Files To Download
  • Introduction
  • RegEx Techniques
  • Whitespace And Anchors
  • Intro To Whitespace
  • Whitespace And Negated Character Sets
  • Whitespace As Character Anchors
  • Beginning Of String And End Of String
  • Form Feed
  • Tab Marking
  • Tab Marking - Practical Example
  • Generic Segments
  • Greedy VS Lazy Quantifiers
  • Greedy Vs Lazy Basics
  • Infinite Quantifiers
  • Revisiting Lazy Quantifiers in Pattern-Based Collation
  • Another Lazy VS Greedy Example
  • Alternation (Or-Piping) Issues
  • Grouped VS Ungrouped Alternation Lists
  • Match Execution Order
  • Alternation and List Match Issues
  • List Match RegEx Syntax
  • Issues With Merge Variables
  • Custom Merge Variables
  • Custom Merge Variables
  • Merge Variables And Lexicon Names
  • Practical Example - Using Value Lists I
  • Practical Example - Using Value Lists II
  • Snippet Libraries
  • Fuzzy RegEx
  • Fuzzy Regex Basics
  • The FRX Visualizer
  • Intro To Fuzzy Weightings
  • Fuzzy Examples
  • Fuzzy In Pattern Match Extractors
  • Fuzzy Weighting Lexicons
  • Fuzzy False Positives
  • Required Mode
  • Immutable Weightings
  • Match Modes
  • Input Filters, Exclusions, and Subtractions
  • Input Filters
  • Input Filter Basics
  • Input Filter Example
  • Input Filters And Classification
  • Input Filters And Separation
  • Working In Instances
  • Exclusion Extractors
  • Exclusion Basics
  • Practical Example - Name Exclusions
  • Subtraction Extractors
  • Subtraction Basics
  • Practical Example - Header Removal
  • Text Preprocessing
  • Tab Marking
  • Tab Marking
  • Minimum Tab Width
  • Character Size Ratio
  • Detect Lines
  • Underline Detection
  • Ignore Control Characters
  • Ignoring Space Characters
  • Ignoring New Lines
  • Paragraph Marking
  • Intro to "Paragraph Detection"
  • Better Detection With Paragraph Marking
  • Practical Example - Paragraph Marking
  • Vertical and Constrained Wrap
  • Vertical Wrap
  • Intro To Constrained Wrap
  • Constrained Wrap In Pattern Match
  • Further Troubleshooting
  • Result Postprocessing
  • Named Groups and Named Instances
  • Named Group Basics
  • Named Groups And Data Sections
  • Named Groups Or Named Instances
  • Output Formats
  • Formatting Results With Output Formats
  • Output Formats Example I
  • Output Formats Example II
  • Result Filters and Result Options
  • Result Filters
  • Result Options - Value Types
  • Other Result Options
  • Sort Order & Waterfall Extraction
  • Intro To Sort Order
  • Sort By Extractor And Waterfall Extraction
  • Waterfall Classification
Completion rules
  • All units must be completed
  • Leads to a certificate with a duration: 2 years
Prerequisites