How do I test my custom RegEx before deployment?
Testing your custom regex before a full-scale deployment is a critical "Best Practice" to prevent "result floods" (too many false positives) or "silent failures" (missing real data).
Here are the 2 ways to test your RegEx, ranging from a quick syntax check to a real-world environment test.
The "Single-File" Agent Test (Highly Recommended)
This is the most accurate test because it uses the actual Spirion Search Engine on a real machine.
- Step A: Create a "Golden Set" test file (a Word or Excel doc) containing:
- Valid MRNs in various formats.
- MRNs buried in paragraphs of text.
- "Near-miss" data (numbers that are almost MRNs but shouldn't match).
- Step B: Place this file in a specific folder on a test endpoint (e.g.,
C:\Spirion_Test\). - Step C: Create a Test Policy in the console:
- Data Types: Select only your new Custom Regex.
- Locations: Target only that specific folder (
C:\Spirion_Test\).
- Step D: Run the scan and review the Match Evidence. This confirms the agent can read the pattern within the file structure.
Use External Regex Testers (For Syntax Only)
If you are struggling with the regex logic itself, use a tool like Regex101.com.
- Configuration: Set the flavor to PCRE or Python (which most closely match Spirion's engine).
- The Test: Use the "Unit Test" feature to ensure your boundaries (
\b) and quantifiers ({8}) are working as expected. - Warning: External tools only test the pattern. They cannot test Spirion-specific features like Keyword Proximity or File Type Decoders. Always follow up with a "Single-File" test (Method 2).
Troubleshooting Checklist
- Check your Boundaries: If your regex is matching parts of longer numbers, ensure you have
\bat the start and end (e.g.,\b\d{8}\b). - Check for Case Sensitivity: If your MRN has letters (e.g.,
MRN-123), ensure your regex accounts for case (e.g.,[Aa][Bb]-\d{5}) or that the "Case Insensitive" box is checked in the console. - Test Different File Types: A regex that works in a
.txtfile might behave differently in a complex.pdfor.xlsxdue to how the agent extracts text. Include multiple file formats in your "Golden Set." - Verify Keyword Proximity: If you added keywords (like "Patient ID"), make sure your test file actually includes those words within the character distance you specified (usually 50-100 characters).
Summary: Start with the Single-File Agent Test using a "Golden Set" of data to verify real-world performance before enabling the policy for the whole company.