Self-Healing And Sentinel
Self-healing helps IT operations teams keep endpoint recovery work moving without giving automation unlimited authority. Use it to see which endpoints are healthy, which ones need attention, what Pharaoh tried, and where a human approval or rejection is required before work continues.
The main operator jobs are:
- check fleet-level self-healing health and pending escalation volume
- open endpoint-specific Sentinel history before approving risky work
- review the automation request, policy context, and expiry time for an escalation
- approve only the bounded action Pharaoh asked for, or reject it with a reason
- confirm that the endpoint returned to a known-good or intentionally deferred state
Where It Lives
Section titled “Where It Lives”Use Self-Healing in the Operations navigation group.
The main routes in this product area are:
/self-healingfor the self-healing overview/self-healing/settingsfor organization-wide Sentinel and Autoheal policy defaults/self-healing/escalationsfor the escalation queue/self-healing/escalations/<escalation-id>for one escalation review/endpoints/<endpoint-id>/self-healingfor the endpoint self-healing Status view inside the endpoint detail shell/endpoints/<endpoint-id>/self-healing/sessionsfor linked recovery sessions/endpoints/<endpoint-id>/self-healing/sentinelfor active Sentinel provenance and source/endpoints/<endpoint-id>/self-healing/sentinel-runsfor Sentinel run history/endpoints/<endpoint-id>/self-healing/activityfor the paginated endpoint self-healing activity timeline/endpoints/<endpoint-id>/self-healing/knowledgefor accepted endpoint knowledge and pending proposals
Endpoint details also include a Sentinel panel. Use Self Healing in the endpoint tabs to open the endpoint-scoped self-healing workspace. Older /self-healing/endpoints/<endpoint-id> links redirect into the endpoint detail shell.
Self-Healing Overview
Section titled “Self-Healing Overview”The Self-Healing page is the fleet-level entry point for daily triage. Start here when you need to know whether automation is waiting on people, whether a specific endpoint needs review, or whether the queue is clear.

The mobile overview keeps the same triage path available for on-call work: pending count, endpoint lookup, settings, and review rows.

Use the overview in this order:
- Check the pending escalation count to understand whether a human decision is blocking automation.
- Scan the pending rows for endpoint, category, requested action, and age.
- Use
Refreshwhen you are taking over an active queue from another reviewer. - Open a specific endpoint when the row does not provide enough context.
- Use
Reviewonly after you know which endpoint and action you are evaluating.
What you can do there:
- see the number of pending escalations currently loaded from the self-healing projection
- refresh the pending escalation list
- enter an
Endpoint IDand selectOpen endpoint - select
Settingsto edit shared Sentinel and Autoheal defaults - inspect pending escalation rows and choose
Review
Use the pending count as a workload signal, not as proof that the fleet is unhealthy. One endpoint can generate a high-priority approval request even when most endpoints are passing Sentinel checks. If there are no pending rows, Pharaoh shows No pending self-healing escalations instead of an empty table.
Self-Healing Settings
Section titled “Self-Healing Settings”The Self-Healing Settings page controls the shared organization-wide config that endpoints use when they do not have an endpoint-specific override.
Settings include:
Sentinel enabledAutoheal enabledRead-only policy template IDAutoheal policy template IDSentinel cadenceExecution interval secondsRegeneration interval seconds
Treat these as fleet policy defaults. Before changing them, check whether the issue is isolated to one endpoint, one policy template, or a broader operational rule. The config does not include a platform setting. Pharaoh uses the endpoint agent’s observed runtime platform when Sentinel generation or execution runs.
Endpoint Sentinel Panel
Section titled “Endpoint Sentinel Panel”The endpoint detail page shows a Sentinel panel between the endpoint identity summary and the endpoint detail tabs.
The panel can show:
- Sentinel status, such as
Passed,Failed,Timeout,Invalid output,Policy denied,Runner error,Stale, orNot configured - the latest summary text
Last run,Duration,Version, andPolicy- links to an active self-healing session or agent thread when one is projected
- a pending escalation count
- accepted knowledge and pending proposal counts
Use this panel when you need a fast answer to whether the endpoint has recent Sentinel context before opening deeper self-healing history. For approval work, stale or missing Sentinel context is a reason to slow down and inspect the endpoint page instead of approving from the queue alone.

The panel is also usable on a phone during on-call review. The same evidence remains visible: status, summary, run time, policy, active session or thread, pending escalation count, and knowledge counts.

Before approving from this context, confirm:
- The endpoint identity matches the ticket, alert, or user report.
Last runis newer than the incident context you are acting on.- The summary explains why Pharaoh needs help.
- The policy and version match the expected guardrail boundary.
- Pending escalations and knowledge counts are consistent with the request.
Endpoint Self-Healing Detail
Section titled “Endpoint Self-Healing Detail”The Endpoint Self-Healing workspace is the operational record for one endpoint. It stays inside the endpoint detail page, so the endpoint header and endpoint tabs remain visible while you move through self-healing history. Open it before approving work when the request could change endpoint state, require elevated access, or affect a business-critical user.
The workspace has six endpoint-scoped subviews in the left rail.
Status
- current self-healing and Sentinel state
- operator-readable facts for policy, active Sentinel, latest run, accepted knowledge, pending proposals, and pending escalations
- recent activity preview
- links into the detailed subviews when a fact needs inspection
Use Status first. It answers whether the endpoint is healthy, whether automation needs action, and whether the current policy and Sentinel context are recognizable without exposing raw database ids as primary labels.

On mobile, Status keeps the same endpoint context, self-healing rail, facts, and activity preview without requiring horizontal scrolling.

Sessions
- linked recovery sessions for this endpoint
- session status and terminal outcome
- agent thread links when projected
- dates and friendly titles instead of primary ids
Use Sessions to understand what Pharaoh already attempted and whether the requested action is a continuation of a known recovery path or a new branch of work.


Sentinel
- active Sentinel version and provenance
- generation, validation, and activation context
- latest execution state
- full-width Sentinel source below the provenance and detail cards
View full scriptwhen you need the complete source in a modal
Check Sentinel for recency and consistency. A recent Passed result can support approval for a narrow follow-up. Repeated Failed, Timeout, Policy denied, or Runner error results suggest you should inspect the session and policy context before deciding.


Run history
- historical Sentinel executions
- generated run titles, status, completed time, and duration
- output summaries and checks when available
- session links for runs that triggered recovery work
Use Run history when the latest result is not enough. A single failed run may be transient; repeated failed, timed out, or policy-denied runs are stronger evidence that approval should slow down.


Activity
- combined self-healing activity synthesized from existing Sentinel, session, knowledge, proposal, and escalation records
- event-specific labels, status, source, and time
- links to the relevant endpoint self-healing subview when available
- pagination for longer histories
Use Activity when you need the complete timeline rather than the short Status preview. The list is a projection for operator review, not a separate audit log or new persisted event store.


Knowledge
- accepted endpoint self-healing knowledge
- pending knowledge proposals
- proposal
ApproveandRejectcontrols when your role can review proposals - a required rejection reason before
Rejectis enabled
Endpoint knowledge is endpoint-specific self-healing memory. Approve knowledge proposals only when they describe durable, endpoint-relevant facts that should help future recovery. Reject proposals that are speculative, temporary, user-specific, or better suited to organization-wide documentation.


When reviewing knowledge before an escalation decision, look for facts that explain the current failure: known service names, endpoint-specific maintenance windows, hardware limitations, or previously accepted false positives. Do not treat pending proposals as trusted evidence until a reviewer has approved them.
Organization-wide runbooks and imported support content still live in IT Knowledge Base.
Structured Outcome Cards In Agent Worklogs
Section titled “Structured Outcome Cards In Agent Worklogs”Self-healing sessions write final structured outcomes into the same Agent Core worklog used by endpoint sessions. Pharaoh renders those outcomes as compact operational cards instead of treating the final answer as ordinary prose.
Card types you may see:
Sentinel candidatewhen Sentinel generation or regeneration produced a candidate script, validation state, activation state, and endpoint update dispatch state.Self-healing investigationwith outcomeFixed,False positive,Unable to fix escalated, orIgnored not applicable.Structured output validation failedwhen the assistant could not produce a valid final output after repair attempts.Unknown structured output contractwhen a future contract is visible before the local UI has a purpose-built renderer.
Use the cards as audit evidence. Check the status badge, summary, processing timeline, trace links, and any learning, escalation, or regeneration section before deciding that automation finished correctly. A false-positive card can include a separate regeneration recommendation; that does not mean the active Sentinel changed until validation and activation state confirm it.
Current screenshot replay page IDs for these card states are tracked in the screenshot manifest:
self-healing-candidate-cardself-healing-investigation-card-fixedself-healing-investigation-card-false-positiveself-healing-investigation-card-escalatedagent-core-structured-output-validation-failureagent-core-structured-output-unknown-fallback
Escalation Queue
Section titled “Escalation Queue”Use escalation links from Self-Healing, endpoint self-healing pages, or endpoint Sentinel panels when you need to find, filter, or review escalation records.

On mobile, use the same queue checks before opening Review: current status, endpoint, category, requested action, and whether the record is still pending.

The queue includes:
SearchEndpointStatusCategoryApplyRefresh- pagination controls when there are multiple pages
The status filter supports Pending, Approved, Rejected, and Expired. The category filter includes Policy override, Sentinel generation failure, Self-healing failure, Permission request, and Other.
Use filters to separate active decisions from audit review. Pending is the approval workload. Approved, Rejected, and Expired are useful when you need to understand prior handling or repeated endpoint behavior. Every row keeps Review available so approved, rejected, and expired escalations remain inspectable.
Escalation Review
Section titled “Escalation Review”The Self-Healing Escalation detail page shows the escalation id in the header, the current status badge, endpoint id, thread id, category, requested action, created and expiry times, policy snapshot, any recorded grants, and the decision area.

The mobile review page keeps the decision controls close to the request summary, so it is suitable for review but still requires the same evidence check before approval.

Before deciding, check:
- endpoint id: confirm the request is for the endpoint you intended to review
- category: understand whether this is a policy override, failure recovery, permission request, or other escalation
- requested action: approve only the specific continuation described, not a broader class of future work
- expiry time: reject or let stale requests expire instead of approving work whose context may no longer be valid
- policy snapshot and grants: confirm the requested action fits your organization’s guardrails
- endpoint context: open the endpoint self-healing page when Sentinel status, session history, or knowledge affects the decision
Approve when the request is specific, current, policy-compatible, and backed by endpoint context that explains why automation needs the escalation. Reject when the request is too broad, stale, unsafe for the endpoint state, unsupported by evidence, duplicates a failed pattern, or should be handled manually.
For pending escalations:
- operators without review permission can inspect the escalation but do not see approve or reject actions
- reviewers can select
Approve - reviewers can enter
Rejection reasonand then selectReject
After a decision, the page reports Escalation approved., Escalation rejected., or the backend error message returned by the API.
Success Checks
Section titled “Success Checks”After approving or rejecting, confirm the operational outcome:
- refresh the escalation queue and verify the status changed or the pending item cleared
- reopen the endpoint self-healing page and check the latest Sentinel, session, and escalation history
- confirm the endpoint’s current state matches the reason for the decision
- look for repeated escalations from the same endpoint before treating the issue as resolved
- document any durable endpoint-specific learning through the endpoint knowledge review flow when a proposal is available