`predict_next_prompt`

Predict what the human would type in response to an AI agent's prompt, using their Clone (personalized user model). Returns top-K ranked candidates with calibrated probabilities. Clients should pick the top candidate when its confidence ≥ threshold (auto-respond) and otherwise escalate to the human (autocomplete suggestion).

Input

Field	Type	Required	Default	Notes
`agent`	string	yes	—	Name of the AI agent asking the human a question (`"Claude Code"`, `"Codex"`, `"Cursor"`).
`agent_input`	string	yes	—	The exact prompt the agent is sending to the human.
`k`	integer (1–10)	no	`1`	Number of candidate replies to generate.
`threshold`	number (0–1)	no	`0.8`	Confidence threshold for `auto` vs `escalated` status.
`session_id`	string	no	—	Optional grouping for later prediction history queries.

Output

{
  "id": "<uuid>",
  "predicted_response": "<top-1 string>",
  "confidence": 0.42,
  "reasoning": "<why the top candidate was chosen>",
  "candidates": [
    { "response": "...", "confidence": 0.42, "reasoning": "..." },
    { "response": "...", "confidence": 0.28, "reasoning": "..." },
    { "response": "...", "confidence": 0.15, "reasoning": "..." }
  ],
  "k": 3,
  "status": "escalated",
  "threshold": 0.8,
  "model": "claude-sonnet-4-6",
  "latency_ms": 4827
}

status is server-decided: "auto" if candidates[0].confidence ≥ threshold, else "escalated".

Example

curl -sS -X POST https://api.clone.is/api/predictions/predict/ \
  -H "X-Clone-API-Key: $CLONE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "agent": "Claude Code",
    "agent_input": "Test finished. What next?",
    "k": 3,
    "threshold": 0.8
  }'

What good calibration looks like

In a cold-start state (no Memory data accumulated for the user yet), expect confidence in the 0.3–0.5 band and status: "escalated". As the Memory layer accumulates real history, top-1 confidence on familiar agent prompts moves toward 0.8+ and status: "auto" becomes common.

Errors

Upstream	What you'll see at the MCP client
`503` (LLM key missing or upstream unreachable)	`network error … 503`
`429`	`network error … 429`
`502` (Anthropic upstream non-200)	`network error … 502`
`400` (`agent` or `agent_input` missing)	`network error … 400`

Underlying server logic lives in apps/server/predictions/views.py:predict_view and apps/server/predictions/llm.py.

Input​

Output​

Example​

What good calibration looks like​

Errors​

Input

Output

Example

What good calibration looks like

Errors