What are Best Practices for Managing Large Log Exports?

Because Agent Logs and Search Results can be extremely high-volume, following these best practices will ensure your exports are reliable and your SIEM remains performant.

Managing large log exports in the Spirion Sensitive Data Platform requires a balance between data completeness and system performance.

1. Use Incremental Exports (The "Checkpoint" Method)

Never attempt to export your entire log history in a single request. Instead, use a "checkpoint" or "watermark" strategy.

How it works: Store the timestamp of the last record you successfully exported. In your next API call, use that timestamp as your startDate.
Benefit: This prevents data duplication and ensures that each export job only handles the "delta" (new data) since the last run.

2. Implement Robust Pagination

For large datasets, the Spirion API returns results in pages (typically 500–1,000 records at a time).

Best Practice: Your export script must use a while loop to check for additional pages. If the number of records returned equals your limit, assume there is more data and request the next page using the last record's timestamp as the new starting point.
Memory Management: Write data to your destination (file or SIEM) inside the loop for each page. Do not store all pages in a single Python list, as this can lead to memory exhaustion.

3. Filter at the Source

Reduce the "noise" before the data even leaves the Spirion platform.

Log Families: If you are troubleshooting connectivity, only export EPS logs. If you are auditing search accuracy, only export IDF logs.
Severity Levels: For SIEM alerting, you may only want to export logs with a severity of Error or Critical, while ignoring Info or Debug messages.
Agent Groups: If you have a massive fleet, consider running separate export jobs for different agent groups (e.g., "Servers" vs. "Workstations") to keep the payload sizes manageable.

4. Optimize for SIEM Ingestion

If you are pushing logs to a SIEM like Splunk or Microsoft Sentinel:

JSONL Format: Export data in "JSON Lines" format (one JSON object per line). This is the most efficient format for SIEM parsers.
Batch Uploads: Instead of sending one log at a time, batch your logs into groups of 100–500 before sending them to your SIEM's HTTP Event Collector (HEC).
Field Mapping: Ensure your SIEM is configured to parse Spirion's standard fields (e.g., correlationId, agentId, timestamp) so you can immediately begin building dashboards.

5. Security and Governance

Redaction: Ensure that Agent-Side Redaction is enabled if you are exporting Search Results. This prevents full sensitive values (like complete SSNs) from being stored in your external log management system.
API Key Rotation: Treat your API Bearer Tokens as highly privileged credentials. Rotate them regularly and store them in a secure vault (like AWS Secrets Manager) rather than hardcoding them in scripts.
Audit the Exporter: Monitor the Audit Log in Spirion to ensure your export service account is logging in successfully and not generating "Unauthorized" errors.

6. Performance and Rate Limiting

Polite Polling: Do not poll the API every few seconds. For most organizations, a 5-minute or 15-minute interval is sufficient for "near real-time" monitoring.
Backoff Logic: Implement "exponential backoff" in your script. If the API returns a 429 Too Many Requests or a 503 Service Unavailable error, wait for a few seconds before retrying, increasing the wait time with each failure.

Summary Checklist

Incremental: Only pull new data since the last run.
Paginated: Loop through all available pages.
Filtered: Only export the log families and severities you need.
Redacted: Protect sensitive data in the export payload.
Batched: Send data to your SIEM in efficient chunks.

By following these practices, you can turn Spirion's high-volume technical data into a high-value security asset without impacting the stability of your platform.