Tips: How does scanning Online Archives affect scan duration?
The primary reason for this increase is that Online Archives are, by definition, not local.
Here is a breakdown of how they affect performance and duration:
1. Forced "Online" Mode (No Caching)
Standard mailboxes are typically scanned in "Cached Mode," where the Spirion Agent reads data directly from the local .ost file on the hard drive.
- The Difference: Online Archives do not have a local cache. To scan them, the Agent must set "Search only Cached Exchange Stores" to No.
- The Impact: The Agent must request every single email and attachment from the Exchange server over the network. This introduces network latency for every item scanned, which drastically slows down the "items per minute" rate.
2. Throttling by Microsoft 365 / Exchange
Because the Agent is making thousands of rapid-fire requests to the server to pull archive data:
- The Impact: Microsoft 365 (EWS/Graph) or on-premise Exchange throttling policies will likely kick in. When the server detects high volume, it intentionally slows down the responses to the Spirion Agent to protect server health.
- Result: The scan may appear to "hang" or move at a crawl as the Agent waits for the server's "cool-down" period to end.
3. Massive Item Counts
Online Archives are often used as "dumping grounds" for years of historical data.
- The Impact: It is common for an Online Archive to contain 200,000+ items, even if the primary mailbox is small. Since Spirion must open and inspect every item, a high item count in an archive will linearly increase the scan time.
4. Deep Folder Hierarchies
Archives often contain complex, nested folder structures from years of organization.
- The Impact: The MAPI/CAPI crawler used by the Agent must traverse every folder. Deeply nested structures require more overhead for the Agent to track and navigate, adding to the total duration.
5. Attachment Overhead
Archives often contain older, larger attachments (PowerPoints, PDFs, ZIPs) that users didn't want to delete.
- The Impact: Each attachment must be downloaded from the server, decrypted (if necessary), and extracted before it can be scanned. If an archive is full of 20MB attachments, the scan duration will skyrocket.
Best Practices to Mitigate Duration
If you must scan Online Archives, do not include them in your daily or weekly "Standard" scans. Instead:
- Separate the Policies: Create a dedicated "Archive Scan" policy that is separate from the "Primary Mailbox" policy.
- Run Off-Hours: Schedule Archive scans for weekends or late nights to avoid competing with user traffic and to minimize the impact of network latency.
- Use "Discovery" First: Run a Discovery-only scan on the archives first to identify which ones are actually large. This allows you to target your deep scans only where the risk is highest.
- Filter by Date: If possible, configure the policy to only scan items in the archive that were modified in the last 1-2 years, rather than scanning 15 years of history every time.
Summary
Scanning Online Archives is a network-intensive task.
It is always significantly slower than a local scan because it bypasses the speed of the local hard drive in favor of the latency of the network and the restrictions of the mail server.