Structured vs Unstructured Documents

Welcome to the Grooper A.C.E. Consultant level course Data Extraction for Structured Documents!

Documents are often described as structured, semi-structured or unstructured.  Structured documents are those whose format remains fixed (or nearly fixed in the case of semi-structured) from one to the next.  Think about fillable forms.  The only thing that changes from one document to the next is the fields you fill in.  The rest of the document, the labels identifying the field and language describing how to fill the form, remains the same.  This differs from unstructured documents who utilize a paragraph structure to detail information.  For example, contracts and correspondences.

This course focuses on data extraction techniques relevant to structured and semi-structured documents.  In this video, we will introduce structured and semi structured documents and what distinguishes them from unstructured documents.