What common errors can occur during custom data type import?

Common errors during custom data type imports usually fall into three categories: formatting issues, logic/syntax errors, and system constraints.

Here are the most frequent pitfalls to watch for:

CSV Formatting & Schema Errors

These occur when the console cannot "read" the file you uploaded.

  • Header Mismatch: If the column names in your CSV (for example, Name, Pattern, Type) do not exactly match what the console expects, the import fails or skips columns.
    • The Fix: Export a single custom data type first to use as a "perfect" template.
  • Encoding Issues: If your CSV contains special characters (common in medical terms or complex RegEx) and is saved in a non-UTF-8 format, the characters may become "garbled," causing the regex to fail.
    • The Fix: Always save your CSV as CSV UTF-8 (Comma delimited).
  • Hidden Commas: If your RegEx pattern contains a comma (e.g., \d{3,5}), a standard CSV might interpret that as a "new column," breaking the row.
    • The Fix: Ensure your CSV editor wraps the "Definition" column in double quotes.

RegEx Syntax & "Flavor" Errors

Spirion's engine has specific requirements for how patterns are written.

  • Invalid Escape Characters: Forgetting to escape special characters (like . or ?) or using an unsupported escape sequence will cause the agent to error out during the scan.
  • Unsupported "Flavors": Using RegEx syntax that works in a web browser (JavaScript) but isn't supported by the Spirion engine (which is closer to PCRE/Python).
  • Missing Boundaries: Importing a RegEx like \d{8} without word boundaries (\b) won't cause an import error, but it will cause a "Result Flood" (false positive error) during the first scan.

Logic & Proximity Errors

These errors happen when the settings within a XML import conflict with the environment.

  • SearchAPI Script Missing: If you import a JSON object that references a SearchAPI script that hasn't been uploaded to your console's Script Repository yet, the Data Type will be "broken" and won't return results.
  • Invalid Proximity Values: Setting a keyword proximity distance that is too large (for example, 10,000 characters) can cause performance degradation or "Timeout" errors on the Agent side.
  • Duplicate Names: Attempting to import a Data Type with a name that already exists. Depending on your version, this either fails or silently overwrites the existing one, which can break existing Scan Policies.

Dictionary-Specific Errors

  • File Size Limits: Attempting to import a Dictionary (text file) that is too large for the Agent to load into memory.
  • Empty Lines/Whitespace: Dictionaries with trailing spaces or empty lines can sometimes cause the Agent to match "nothing" (which effectively matches everything), leading to a scan crash.


Troubleshooting Checklist

  1. Check the Logs: If an import fails, check the Console Audit Logs. They often provide a specific reason (e.g., "Column 'Definition' not found").
  2. The "One-Row" Test: If a large CSV is failing, try importing a file with just one row. If that works, the issue is likely a specific character or formatting error in one of the other rows.
  3. Validate Regex Externally: Use a tool like Regex101 (set to PCRE) to ensure your pattern is valid before putting it in the CSV.
  4. Verify Script Dependencies: If importing a JSON/XML, ensure any referenced scripts are already in the Settings > Script Repository.

Summary

Most errors are caused by incorrect CSV headers or unquoted commas in RegEx patterns.

  • Use an exported template and UTF-8 encoding to avoid 90% of import failures.