Are There Examples of RegEx for Common Health Information Data Types?

When building custom Data Types for Health Information (PHI/HIPAA), your Regex should focus on identifiers that link an individual to medical services, insurance, or clinical records.

Below are examples of Regex patterns for common health-related identifiers used in the US healthcare system.

1. National Provider Identifier (NPI)

The NPI is a unique 10-digit identification number issued to health care providers in the United States.

  • Pattern: \b[1-9]\d{9}\b
  • What it finds: 10-digit numbers starting with 1-9 (e.g., 1234567890).
  • Tip: Because this is a generic 10-digit number, it will cause false positives. Always use this in an SDD with keywords like "NPI", "Provider", or "Physician" within 20 characters.

2. Health Insurance Claim Number (HICN)

Used by Medicare to identify beneficiaries. While being phased out for MBIs (see below), they still exist in millions of legacy records.

  • Pattern: \b\d{9}[A-Z][0-9A-Z]?\b
  • What it finds: 9 digits followed by one or two alpha-numeric characters (e.g., 123456789A).
  • Tip: This pattern is very specific and highly effective for identifying legacy Medicare data.

3. Medicare Beneficiary Identifier (MBI)

The modern replacement for the HICN. It is designed to be non-intelligent (no SSN included) and uses a specific character set.

  • Pattern: \b[1-9][A-Z][0-9A-Z]\d[A-Z][0-9A-Z]\d[A-Z]{2}\d{2}\b
  • What it finds: The complex 11-character alphanumeric string used on modern Medicare cards.
  • Tip: This is a "High Fidelity" pattern. If you find this, it is almost certainly a Medicare record.

4. Medical Record Number (MRN)

MRNs vary by hospital system, but many use a standardized prefix or a specific length (often 7-10 digits).

  • Pattern (Example): \bMRN[-:\s]?\d{6,10}\b
  • What it finds: The literal "MRN" followed by 6 to 10 digits (e.g., MRN-1234567).
  • Tip: If your hospital uses a specific prefix (like "HC-"), update the regex to \bHC-\d{7}\b for 100% accuracy.

5. ICD-10 Diagnosis Codes

Used to classify every disease, symptom, and injury.

  • Pattern: \b[A-Z][0-9][0-9A-Z](\.[0-9A-Z]{1,4})?\b
  • What it finds: Codes like E11.9 (Type 2 Diabetes) or S82.101A (Fracture).
  • Tip: These codes are short and appear in many non-medical contexts. Only use this in an SDD that requires a Patient Name or SSN to be nearby to avoid flagging random alphanumeric strings.

6. Drug Enforcement Administration (DEA) Number

Assigned to healthcare providers allowing them to write prescriptions for controlled substances.

  • Pattern: \b[A-Z]{2}\d{7}\b
  • What it finds: Two letters followed by 7 digits (e.g., AB1234567).
  • Tip: The second letter is usually the first letter of the provider's last name. You can use this for advanced validation if you are scanning a specific provider's files.


How to Implement These in Spirion Sensitive Data Platform

  1. Create the Data Type: Go to Configuration > Data Types > Add Regex.
  2. Use Proximity Logic (The "HIPAA Rule"):
    • A Medical Record Number by itself is low risk.
    • A Medical Record Number next to a Name is a HIPAA violation.
    • Logic: Match if Regex (MRN) is within 50 characters of Data Type (Person Name).
  3. Enable Agent-Side Redaction: In your Agent Policy, ensure these types are set to Full Redaction for the console.
    • Why: You do not want your Spirion Console to become a repository of PHI, which would put the console in scope for HIPAA audits.

Summary Table

Data Category

Regex Pattern

Use Case

Medicare (MBI)

\b[1-9][A-Z][0-9A-Z]\d[A-Z][0-9A-Z]\d[A-Z]{2}\d{2}\b

Modern Medicare identification

Provider (NPI)

\b[1-9]\d{9}\b

Identifying physician/clinic data

Diagnosis (ICD-10)

\b[A-Z][0-9][0-9A-Z](\.[0-9A-Z]{1,4})?\b

Clinical research/billing audits

Prescription (DEA)

\b[A-Z]{2}\d{7}\b

Pharmacy and prescription logs