Tips: What are best practices for scanning large (Exchange) mailboxes?
Here are the established best practices for managing large-scale email scans:
1. Prioritize "Cached Mode" for Routine Scans
For standard, recurring scans, always set "Search only Cached Exchange Stores" to Yes.
- Why: This limits the scan to the data already on the user's hard drive. It eliminates network latency and prevents the "Outlook is retrieving data" hang that frustrates users.
- Trade-off: You may miss older data not synced to the local machine, but you gain significant stability and speed.
2. Use "Discovery-Only" Scans to Identify Hotspots
Before running a full sensitive data scan on thousands of large mailboxes, run a Discovery Scan ("Data Types" unchecked).
- Why: This enables you to see which mailboxes are the largest and which folders contain the most items without the overhead of deep content analysis.
- Action: Use the results to identify "problem" mailboxes that might need to be moved to a dedicated, off-hours scan group.
3. Leverage Folder GUIDs for Targeted Scanning
If you know that sensitive data is likely stored in specific areas (for example, "Finance" or "HR" folders), use the "Search Selected Outlook Folders" option with GUIDs.
- Why: Instead of scanning a 100 GB mailbox, the Agent can jump directly to the 2 GB folder that actually matters. This reduces scan time from hours to minutes.
4. Exclude "Noise" Folders
Large mailboxes are often inflated by folders that rarely contain actionable sensitive data. Use the Exclude options in the wizard for:
- Deleted Items / Dumpster: Unless required for legal reasons, excluding these can significantly reduce the item count.
- Junk Email: High volume, low value.
- RSS Feeds / Sync Issues: These folders often contain thousands of small system messages that slow down the MAPI crawler.
5. Manage Attachment Scanning
Attachments are the primary cause of "slow" email scans.
- Limit by Size: Configure the policy to skip attachments larger than a certain size (for example, 20 MB). Large attachments often contain video or encrypted installers that Spirion cannot scan anyway.
- Limit by Type: Exclude known "safe" or "unscannable" extensions (for example,
.exe,.zip,.mp4) to keep the agent focused on documents and spreadsheets.
6. Throttle the Scan (CPU and Network)
For large mailboxes, the Agent is active for a longer duration.
- CPU Throttling: Set the agent to "Low" or "Below Normal" priority in the policy settings. This ensures that even if the scan takes a long time, the user can still use Word, Excel, and Outlook without lag.
- Staggered Start: Do not start scans for 5,000 users at the same time. Use the "Random Delay" feature in the Spirion Console to spread the start times over several hours.
7. Handle "Online Archives" Separately
If your organization uses Exchange Online Archives, do not include them in the same policy as the primary mailbox scan.
- Why: Archives are almost always "Online Only." Scanning them requires setting "Search only Cached Exchange Stores" to No, which is slow.
- Best Practice: Create a separate, quarterly policy specifically for Archives and run it during low-traffic periods (for example, weekends).
8. Monitor for "MAPI_E_TABLE_TOO_BIG"
In extremely large folders (for example, an Inbox with 100,000+ items), the MAPI subsystem itself may fail.
- The Fix: If you see this error in the logs, the user must be instructed to archive their data into sub-folders. Spirion (and Outlook) performs much better with 10 folders of 10,000 items than 1 folder of 100,000 items.
Summary
For large mailboxes, speed is achieved through exclusion. By focusing on Cached data, using GUIDs for targeting, and excluding high-volume/low-value folders, you can maintain a strong security posture without disrupting the business.