Troubleshooting: "Unable to find job queue: search_queue"

If you receive the error "ERROR Unable to find job queue: search_queue" this behavior typically points to a timing or connectivity issue within the PostgreSQL-backed agent queueing architecture.

Here is the breakdown of what is likely happening and how to address it:

1. The "False Positive" Scenario (Most Likely)

If your scan eventually completes and you see results, this error may be a known logging issue.

  • The Cause: In recent versions, the Endpoint Service (EPS) attempts to clean up old task data or check if a scan is complete. It calls a function to check the queue (_taskHasPendingJobs), but if the queue has already been de-provisioned because the scan is finishing, it logs an ERROR.
  • Impact: According to AL-35251: Fix "Unable to find job queue" ERROR reported in EPS logClosed, this error is often "incorrect/unnecessary" and occurs even though the system is successfully cleaning up task data.

2. The "Discovery Lag" Scenario

In a distributed scan, the Search Agents (workers) often start up and try to connect to the queue before the Discovery Agent has actually finished initializing the Postgres database and creating the specific search_queue_[GUID].

  • The Log: Attempting to connect to job queue... Job Queue connection failed, retrying...
  • The Behavior: Search agents will retry at a ~30-second cadence. If the Discovery Agent is slow to build the queue (due to resource constraints or a large number of initial locations to index), the search agents will log these errors until the queue becomes visible.
  • Resolution: This usually resolves itself once the Discovery Agent enters the "Searching" phase.

3. Critical Connectivity Issues (If the scan stays stuck)

If the agents never connect and the scan fails, check the following configuration items:

  • Port Mismatch: Ensure ports 5433 (Postgres) and 6433 (pgBouncer) are open on the Discovery Agent's firewall. Search agents must be able to reach these ports on the Discovery machine to "see" the queue.
  • Discovery Agent Status: If the Discovery Agent machine has crashed or the Spirion Agent service on that specific machine has stopped, the job queue it hosts will disappear, causing all other search agents to log "Unable to find job queue."
  • Service Account Permissions: Ensure the service account has permission to create/modify files in C:\ProgramData\Spirion, as this is where the local Postgres instance manages the queue data.

Troubleshooting Steps

  1. Check the Discovery Agent's log: Look at the AgentService.log on the machine designated as the "Discovery" node. Ensure it isn't showing database initialization errors.
  2. Wait 5-10 minutes: If the scan is large, the "Unable to find" errors are often just the workers waiting for the Discovery node to finish its overhead.
  3. Verify Port 6433: From a Search Agent machine, try to tnc [Discovery_IP] -Port 6433 to ensure the network path is open.

Summary: If the scan is progressing and results are appearing in the console, you can likely ignore these errors as a known logging artifact. If the scan is stalled, the issue is likely the Discovery Agent failing to host the queue or a firewall blocking port 6433.

Was this article helpful?