Are There Examples of RegEx for Common Health Information Data Types?
Below are examples of Regex patterns for common health-related identifiers used in the US healthcare system.
1. National Provider Identifier (NPI)
The NPI is a unique 10-digit identification number issued to health care providers in the United States.
- Pattern:
\b[1-9]\d{9}\b - What it finds: 10-digit numbers starting with 1-9 (e.g.,
1234567890). - Tip: Because this is a generic 10-digit number, it will cause false positives. Always use this in an SDD with keywords like "NPI", "Provider", or "Physician" within 20 characters.
2. Health Insurance Claim Number (HICN)
Used by Medicare to identify beneficiaries. While being phased out for MBIs (see below), they still exist in millions of legacy records.
- Pattern:
\b\d{9}[A-Z][0-9A-Z]?\b - What it finds: 9 digits followed by one or two alpha-numeric characters (e.g.,
123456789A). - Tip: This pattern is very specific and highly effective for identifying legacy Medicare data.
3. Medicare Beneficiary Identifier (MBI)
The modern replacement for the HICN. It is designed to be non-intelligent (no SSN included) and uses a specific character set.
- Pattern:
\b[1-9][A-Z][0-9A-Z]\d[A-Z][0-9A-Z]\d[A-Z]{2}\d{2}\b - What it finds: The complex 11-character alphanumeric string used on modern Medicare cards.
- Tip: This is a "High Fidelity" pattern. If you find this, it is almost certainly a Medicare record.
4. Medical Record Number (MRN)
MRNs vary by hospital system, but many use a standardized prefix or a specific length (often 7-10 digits).
- Pattern (Example):
\bMRN[-:\s]?\d{6,10}\b - What it finds: The literal "MRN" followed by 6 to 10 digits (e.g.,
MRN-1234567). - Tip: If your hospital uses a specific prefix (like "HC-"), update the regex to
\bHC-\d{7}\bfor 100% accuracy.
5. ICD-10 Diagnosis Codes
Used to classify every disease, symptom, and injury.
- Pattern:
\b[A-Z][0-9][0-9A-Z](\.[0-9A-Z]{1,4})?\b - What it finds: Codes like
E11.9(Type 2 Diabetes) orS82.101A(Fracture). - Tip: These codes are short and appear in many non-medical contexts. Only use this in an SDD that requires a Patient Name or SSN to be nearby to avoid flagging random alphanumeric strings.
6. Drug Enforcement Administration (DEA) Number
Assigned to healthcare providers allowing them to write prescriptions for controlled substances.
- Pattern:
\b[A-Z]{2}\d{7}\b - What it finds: Two letters followed by 7 digits (e.g.,
AB1234567). - Tip: The second letter is usually the first letter of the provider's last name. You can use this for advanced validation if you are scanning a specific provider's files.
How to Implement These in Spirion Sensitive Data Platform
- Create the Data Type: Go to Configuration > Data Types > Add Regex.
- Use Proximity Logic (The "HIPAA Rule"):
- A Medical Record Number by itself is low risk.
- A Medical Record Number next to a Name is a HIPAA violation.
- Logic: Match if Regex (MRN) is within 50 characters of Data Type (Person Name).
- Enable Agent-Side Redaction: In your Agent Policy, ensure these types are set to Full Redaction for the console.
- Why: You do not want your Spirion Console to become a repository of PHI, which would put the console in scope for HIPAA audits.
Summary Table
Data Category | Regex Pattern | Use Case |
|---|---|---|
Medicare (MBI) |
| Modern Medicare identification |
Provider (NPI) |
| Identifying physician/clinic data |
Diagnosis (ICD-10) |
| Clinical research/billing audits |
Prescription (DEA) |
| Pharmacy and prescription logs |