predict_next_prompt
Predict what the human would type in response to an AI agent's prompt, using their Clone (personalized user model). Returns top-K ranked candidates with calibrated probabilities. Clients should pick the top candidate when its confidence ≥ threshold (auto-respond) and otherwise escalate to the human (autocomplete suggestion).
Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
agent | string | yes | — | Name of the AI agent asking the human a question ("Claude Code", "Codex", "Cursor"). |
agent_input | string | yes | — | The exact prompt the agent is sending to the human. |
k | integer (1–10) | no | 1 | Number of candidate replies to generate. |
threshold | number (0–1) | no | 0.8 | Confidence threshold for auto vs escalated status. |
session_id | string | no | — | Optional grouping for later prediction history queries. |
Output
{
"id": "<uuid>",
"predicted_response": "<top-1 string>",
"confidence": 0.42,
"reasoning": "<why the top candidate was chosen>",
"candidates": [
{ "response": "...", "confidence": 0.42, "reasoning": "..." },
{ "response": "...", "confidence": 0.28, "reasoning": "..." },
{ "response": "...", "confidence": 0.15, "reasoning": "..." }
],
"k": 3,
"status": "escalated",
"threshold": 0.8,
"model": "claude-sonnet-4-6",
"latency_ms": 4827
}
status is server-decided: "auto" if candidates[0].confidence ≥ threshold, else "escalated".
Example
curl -sS -X POST https://api.clone.is/api/predictions/predict/ \
-H "X-Clone-API-Key: $CLONE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"agent": "Claude Code",
"agent_input": "Test finished. What next?",
"k": 3,
"threshold": 0.8
}'
What good calibration looks like
In a cold-start state (no Memory data accumulated for the user yet), expect confidence in the 0.3–0.5 band and status: "escalated". As the Memory layer accumulates real history, top-1 confidence on familiar agent prompts moves toward 0.8+ and status: "auto" becomes common.
Errors
| Upstream | What you'll see at the MCP client |
|---|---|
503 (LLM key missing or upstream unreachable) | network error … 503 |
429 | network error … 429 |
502 (Anthropic upstream non-200) | network error … 502 |
400 (agent or agent_input missing) | network error … 400 |
Underlying server logic lives in apps/server/predictions/views.py:predict_view and apps/server/predictions/llm.py.