Tips: How does Despeckle interact with other OCR settings like Deskew?

Despeckle and Deskew are part of a "pre-processing pipeline" that prepares an image for the OCR engine. While they perform different tasks, they are highly complementary; using them together creates a "cleaner" and "straighter" image, which exponentially increases the probability of a successful sensitive data match.

Here is how they interact and work together:

1. The Sequential Workflow

When Spirion encounters an image, it doesn't just "read" it immediately. It applies these settings in a specific order before the character recognition begins:

  1. Deskew (The "Straightener"): First, the agent detects if the page is tilted (common in faxes or manual scans). It rotates the image so the lines of text are perfectly horizontal.
  2. Despeckle (The "Cleaner"): Once the image is straight, the agent identifies and removes "noise" (stray pixels/dots).
  3. OCR Engine (The "Reader"): Finally, the engine looks at the now-straight and now-clean image to identify letters and numbers.

2. Synergistic Benefits

  • Improved Character Alignment: If an image is both crooked (Skewed) and grainy (Speckled), the OCR engine might see a stray dot near a tilted letter "L" and misinterpret it as a "b" or a "h". By straightening the "L" first and then removing the dot, the engine sees a clear, vertical line, resulting in a 100% accurate identification.
  • Validation Accuracy: For data like Credit Card numbers or SSNs, a single misread digit (caused by a speckle or a tilt) will cause the Luhn Check or Checksum to fail. Using both settings ensures the digits are clear and aligned, which significantly reduces "False Negatives" (missing data that was actually there).

3. Performance Trade-offs

  • Cumulative CPU Load: Each pre-processing step (Deskew, Despeckle, and others like Fax Correction) adds a layer of mathematical calculation.
    • Using only one might add a 5-10% overhead to the OCR process.
    • Using both can increase the time spent on each image page by 15-25%.
  • Memory Usage: The agent must hold the original image, the "deskewed" version, and the "despeckled" version in memory simultaneously during the transition. For very high-resolution images, this can increase the memory footprint of the SpirionAgent.exe process.

4. Interaction with "Recognition Mode"

Spirion has a Recognition Mode setting (Favor Speed vs. Favor Accuracy):

  • Favor Accuracy + Deskew + Despeckle: This is the "Gold Standard" for high-risk discovery. It takes the longest but provides the highest possible detection rate for messy, scanned documents.
  • Favor Speed + Deskew + Despeckle: This is a middle ground. The pre-processing cleans the image, but the engine uses a faster, less intensive algorithm to read the characters.

Summary Table: When to use them together

Document Condition

Use Deskew?

Use Despeckle?

Result

Digital Screenshots

No

No

Fastest scan; high accuracy.

Clean Scans (Straight)

No

Yes

Removes "dust" artifacts; prevents misreads.

Crooked Faxes

Yes

Yes

Essential. Fixes alignment and removes transmission noise.

Old Photocopies

Yes

Yes

Essential. Fixes "page creep" and paper grain noise.

SME Recommendation: If you are enabling OCR for anything other than high-quality digital screenshots, you should enable both Deskew and Despeckle. The performance hit is usually outweighed by the massive improvement in data discovery accuracy.