What common errors can occur during custom data type import?
Common errors during custom data type imports usually fall into three categories: formatting issues, logic/syntax errors, and system constraints.
Here are the most frequent pitfalls to watch for:
CSV Formatting & Schema Errors
These occur when the console cannot "read" the file you uploaded.
- Header Mismatch: If the column names in your CSV (for example,
Name,Pattern,Type) do not exactly match what the console expects, the import fails or skips columns. - The Fix: Export a single custom data type first to use as a "perfect" template.
- Encoding Issues: If your CSV contains special characters (common in medical terms or complex RegEx) and is saved in a non-UTF-8 format, the characters may become "garbled," causing the regex to fail.
- The Fix: Always save your CSV as CSV UTF-8 (Comma delimited).
- Hidden Commas: If your RegEx pattern contains a comma (e.g.,
\d{3,5}), a standard CSV might interpret that as a "new column," breaking the row. - The Fix: Ensure your CSV editor wraps the "Definition" column in double quotes.
RegEx Syntax & "Flavor" Errors
Spirion's engine has specific requirements for how patterns are written.
- Invalid Escape Characters: Forgetting to escape special characters (like
.or?) or using an unsupported escape sequence will cause the agent to error out during the scan. - Unsupported "Flavors": Using RegEx syntax that works in a web browser (JavaScript) but isn't supported by the Spirion engine (which is closer to PCRE/Python).
- Missing Boundaries: Importing a RegEx like
\d{8}without word boundaries (\b) won't cause an import error, but it will cause a "Result Flood" (false positive error) during the first scan.
Logic & Proximity Errors
These errors happen when the settings within a XML import conflict with the environment.
- SearchAPI Script Missing: If you import a JSON object that references a SearchAPI script that hasn't been uploaded to your console's Script Repository yet, the Data Type will be "broken" and won't return results.
- Invalid Proximity Values: Setting a keyword proximity distance that is too large (for example, 10,000 characters) can cause performance degradation or "Timeout" errors on the Agent side.
- Duplicate Names: Attempting to import a Data Type with a name that already exists. Depending on your version, this either fails or silently overwrites the existing one, which can break existing Scan Policies.
Dictionary-Specific Errors
- File Size Limits: Attempting to import a Dictionary (text file) that is too large for the Agent to load into memory.
- Empty Lines/Whitespace: Dictionaries with trailing spaces or empty lines can sometimes cause the Agent to match "nothing" (which effectively matches everything), leading to a scan crash.
Troubleshooting Checklist
- Check the Logs: If an import fails, check the Console Audit Logs. They often provide a specific reason (e.g., "Column 'Definition' not found").
- The "One-Row" Test: If a large CSV is failing, try importing a file with just one row. If that works, the issue is likely a specific character or formatting error in one of the other rows.
- Validate Regex Externally: Use a tool like Regex101 (set to PCRE) to ensure your pattern is valid before putting it in the CSV.
- Verify Script Dependencies: If importing a JSON/XML, ensure any referenced scripts are already in the Settings > Script Repository.
Summary
Most errors are caused by incorrect CSV headers or unquoted commas in RegEx patterns.
- Use an exported template and UTF-8 encoding to avoid 90% of import failures.