Are There RegEx Examples of Common CMMC Data Types?
Below are examples of Regex patterns for common CUI (Controlled Unclassified Information) categories.
1. CUI Marking Identifiers
CMMC requires the identification of files explicitly marked as CUI. While keywords are common, Regex can find standardized marking strings.
- Pattern:
\bCUI\/\/[A-Z0-7\s\/]+\b - What it finds: Standardized CUI banners like the following:
CUI//SP-ITARorCUI//BASIC. - Tip: Use this as an "Anchor" in an SDD to find any file that has been officially marked by a government agency.
2. ITAR / Export Control Numbers
Technical data subject to ITAR often carries specific license or category numbers.
- Pattern:
\b(ITAR|EAR)\s?Category\s?[I-X]{1,3}\b - What it finds: References to specific export categories, such as
ITAR Category IVorEAR Category II. - Tip: This is highly effective for identifying technical manuals and engineering specifications that must remain within a US-only enclave.
3. National Stock Numbers (NSN)
If you manufacture parts for the DoD, your files likely contain 13-digit National Stock Numbers.
- Pattern:
\b\d{4}-\d{2}-\d{3}-\d{4}\b - What it finds: Standardized NSNs like
5962-01-345-6789. - Tip: NSNs are often public, so link this Regex to a Keyword List (for example, "Proprietary" or "CUI") in an SDD to reduce false positives from public catalogs.
4. Commercial and Government Entity (CAGE) Codes
CAGE codes are unique identifiers for federal contractors.
- Pattern:
\b[0-9A-Z]{5}\b - What it finds: 5-character codes like
1AB23. - Tip: Because this is a simple 5-character string, it causes many false positives. Always use this with a Proximity Rule requiring the word "CAGE" or "Entity" to be within 20 characters.
5. Distribution Statements
DoD technical documents must carry a Distribution Statement (A through F).
- Pattern:
\bDistribution\sStatement\s[A-F]\b - What it finds: Strings like
Distribution Statement BorDistribution Statement D. - Tip: Statements B through F generally indicate CUI. You can use this Regex to trigger an automated Classification playbook that labels the file as "Restricted."
6. Federal Contract Numbers (PIID)
The Procurement Instrument Identifier (PIID) is the unique number assigned to a federal contract.
- Pattern:
\b[A-Z0-9]{4,6}-[0-9]{2}-[A-Z]-[0-9]{4}\b - What it finds: Standard contract formats like
N00014-21-C-1234. - SME Tip: This is the "Gold Standard" for CMMC discovery. Finding a contract number on an unauthorized device is a clear indicator of a CUI spill.
How to Implement These in Spirion Sensitive Data Platform
- Create the Data Type: Go to Configuration > Data Types > Add Regex.
- Add Validation: For patterns like NSNs, ensure you use the
\b(word boundary) to prevent the Regex from matching a random string of numbers inside a larger block. - Combine into an SDD:
- Example: Create an SDD called
CMMC_Contract_File. - Logic: Match if Regex (Contract Number) is within 50 characters of Keyword (Confidential).
- Example: Create an SDD called
- Enable Redaction: In your Agent Policy, ensure these new Regex types are set to Partial Redaction (for example,
N00014-XX-X-XXXX) so you can identify the contract without creating a data spill in the console.
Summary Table
Data Category | Regex Pattern | Use Case |
|---|---|---|
CUI Banners |
| Finding officially marked docs |
Stock Numbers |
| Defense manufacturing parts |
Contract IDs |
| Identifying specific CUI projects |
Dist. Statements |
| Access control & labeling |