03 • Grooper 2021 → Grooper A.C.E. Training • 2021 → Consultant • 2021

Image Processing and OCR • 2021 (B2021.02.23)


Description
Scanned images present unique challenges to getting good data from documents. In order to target that data, images must be run through Optical Character Recognition (OCR) to convert pixels on the page into machine readable text. Poor image quality and limitations of standard OCR processes can produce less than desirable results.

This course aims to educate users on different methods available in Grooper to improve image quality through image processing and leverage Grooper’s unique approach to OCR to get the best text data from documents. Students will gain a practical understanding of how to build image processing and OCR profiles together to improve the accuracy of standard OCR. Furthermore, students will learn how Grooper’s image processing provides additional visual based information (such as table lines, barcodes, and checkboxes) and how to make use of it.
Content
  • INTRODUCTION
  • Files to Download
  • What is OCR? Native Text vs OCR Text sample
  • Image Processing and OCR - Quiz 1
  • Batch Processing Example
  • An Example Batch Process with OCR sample
  • The First Part of the Story - Image Processing
  • Why is OCR Important? - How Grooper Uses OCR Text Data
  • Image Processing and OCR - Quiz 2
  • The OCR Journey
  • The OCR Journey and OCR Profiles
  • How OCR Engines Work Part 1: Pre-Processing and Segmenting
  • How OCR Engines Work Part 2: Character Recognition and Post-Processing
  • Image Processing and OCR - Quiz 3
  • Evaluating "OCR Engine Only" Results
  • Getting Some Testing Results
  • Manipulating OCR Engine Properties in an OCR Profile
  • Comparing and Contrasting Different OCR Engines: Transym vs Tesseract
  • PERMANENT IMAGE PROCESSING
  • Introduction to Image Processing
  • Creating IP Profiles and Adding IP Steps
  • IP Step Configuration - Auto Deskew
  • Adding Our Next Step - Auto Orient
  • More on IP for OCR - Border Cleanup
  • IP Considerations - What's Good For One May Not Be Good For All (or OCR)
  • Image Processing and OCR - Quiz 4
  • Evaluating OCR Results After Permanent IP
  • Applying a Permanent IP Profile and Evaluating Its OCR Impact
  • Exercise - Build and Apply Your Own Permanent IP Profile
  • Permanent vs Temporary Image Processing
  • Permanent vs Temporary Image Processing
  • IP Primitives
  • Thresholding
  • Blob Removal & Hole Punch Removal
  • Blob Removal & Speck Removal
  • Blob Removal & Check Box Removal
  • Blob Removal & Lines
  • The Line Removal Command
  • Shape Detection & Shape Removal
  • Dilate/Erode
  • Image Processing and OCR - Quiz 5
  • Temporary Image Processing
  • Building an IP Profile for Temporary Image Processing
  • Advanced Image Processing: Conditional IP Commands
  • Evaluating OCR Results After Temporary IP
  • How to Apply a Temporary IP Profile for OCR
  • Exercise - Build And Apply Your Own Temporary IP Profile
  • Image Processing and OCR - Quiz 6
  • OCR Synthesis
  • Font Pitch Detection
  • Segment Reprocessing
  • Image Segmentation
  • Iterative OCR
  • Cell Validation
  • Image Processing and OCR - Quiz 7
  • Evaluating OCR Results After Synthesis
  • Applying Synthesis Settings to Our Demo Batch
  • Exercise - Experiment With Synthesis Settings
  • Extraction Cleanup - Fuzzy Matching
  • Introduction to Fuzzy Regular Expression
  • Mismatches, Required Mode, and Immutable Characters
  • Fuzzy List Matching
  • Image Processing and OCR - Quiz 8
  • Advanced OCR Techniques
  • Layered OCR
  • The Correct Activity
  • Layout Data
  • Lines
  • Check Boxes and OMR
  • Barcodes
  • Layout Data and Tables - The Grid Layout Method
  • Signature Detection
  • Exercise - Use Layout Data to Extract Data
  • Image Processing and OCR - Quiz 9
  • Wrap Up & Exam
  • Wrap Up
  • Image Processing and OCR Course Survey
  • Image Processing and OCR - Exam Assessment
  • Image Processing and OCR - Lab Assessment - Part 1
  • Image Processing and OCR - Lab Assessment - Part 2
Completion rules
  • All units must be completed