Skip to main content

predict_next_prompt

Predict what the human would type in response to an AI agent's prompt, using their Clone (personalized user model). Returns top-K ranked candidates with calibrated probabilities. Clients should pick the top candidate when its confidence ≥ threshold (auto-respond) and otherwise escalate to the human (autocomplete suggestion).

Input

FieldTypeRequiredDefaultNotes
agentstringyesName of the AI agent asking the human a question ("Claude Code", "Codex", "Cursor").
agent_inputstringyesThe exact prompt the agent is sending to the human.
kinteger (1–10)no1Number of candidate replies to generate.
thresholdnumber (0–1)no0.8Confidence threshold for auto vs escalated status.
session_idstringnoOptional grouping for later prediction history queries.

Output

{
"id": "<uuid>",
"predicted_response": "<top-1 string>",
"confidence": 0.42,
"reasoning": "<why the top candidate was chosen>",
"candidates": [
{ "response": "...", "confidence": 0.42, "reasoning": "..." },
{ "response": "...", "confidence": 0.28, "reasoning": "..." },
{ "response": "...", "confidence": 0.15, "reasoning": "..." }
],
"k": 3,
"status": "escalated",
"threshold": 0.8,
"model": "claude-sonnet-4-6",
"latency_ms": 4827
}

status is server-decided: "auto" if candidates[0].confidence ≥ threshold, else "escalated".

Example

curl -sS -X POST https://api.clone.is/api/predictions/predict/ \
-H "X-Clone-API-Key: $CLONE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"agent": "Claude Code",
"agent_input": "Test finished. What next?",
"k": 3,
"threshold": 0.8
}'

What good calibration looks like

In a cold-start state (no Memory data accumulated for the user yet), expect confidence in the 0.3–0.5 band and status: "escalated". As the Memory layer accumulates real history, top-1 confidence on familiar agent prompts moves toward 0.8+ and status: "auto" becomes common.

Errors

UpstreamWhat you'll see at the MCP client
503 (LLM key missing or upstream unreachable)network error … 503
429network error … 429
502 (Anthropic upstream non-200)network error … 502
400 (agent or agent_input missing)network error … 400

Underlying server logic lives in apps/server/predictions/views.py:predict_view and apps/server/predictions/llm.py.