ROW MATCH

Intro to Row Match

Row Match is the oldest Table Extraction methods in Grooper.  It relies upon regular expression and data extraction (via Extractor Types such as Pattern Match and extraction objects such as Data Types and Value Readers) to match each table row on the document.

In one way or another, all Table Extraction methods work by mapping or modeling a table's structure.  If you can figure out where the table is, how many rows there are, and how the columns divide up each row into table cells, you can extract the individual values from each individual cell.  Row Match does this  by matching each row of the table.  The idea is, if you can return each full row in the table, you can then extract each column's value from each individual row.

Row Instances

In Grooper, a "data instance", is a representation of part of a document's text data.  The largest instance would be the full document.  Everything else can be conceptulized as smaller "chunks" of the document. Each page would represent a smaller, page sized, sub-instance of a multi-page document.  A field result for a "Report Number" listed on the document would be an even smaller instance, representing that value's text data and its location on the document.

Data extractors essentially parse the document's full text, matching and returning text data as a smaller instance of the full document. All extraction results are sub-instances of a larger or parent level data instance, containing the text data and location of whatever it is you want to find.

The Data Table object's first job is to establish the "row instances" of a table.  These are the data instances that represent each row of the table on the document. Each Extract Method does this a little differently, mapping the table's structure with varying techniques.  Row Match will return one row instance for each result from the Row Extractor.

Column Extractors

Once the Data Table can establish the row instances, its job is only halfway done.  It then needs to return data for each of of its Data Colum child objects.

One way to do this using the Row Match method is through the Data Column's Value Extractor property.  This will allow you to execute an extractor for each Data Column on each row instance.