Architecture
Clone is a single Django service (apps/server) split into three layers. Every layer is owned by one Django app, persists its own tables under the user boundary, and exposes a small REST surface. Client surfaces (apps/web, apps/desktop, apps/cli, apps/mcp) talk to these REST endpoints over JWT or API-key auth — they hold no business logic of their own.
The three layers
┌──────────────────────────────────────────┐
│ Prediction apps/server/predictions/ │
│ POST /predict/ POST /predict/batch/ │
│ GET / /<id>/feedback/ │
│ GET /stats/ │
└────────────▲─────────────────────────────┘
│ context bundle (in-process)
┌────────────┴─────────────────────────────┐
│ Memory apps/server/memories/ │
│ profile · facts · episodes · raw │
│ POST /context/ POST /sync/ │
│ POST /promote/episodes/ /facts/ │
└────────────▲─────────────────────────────┘
│ derived from
┌────────────┴─────────────────────────────┐
│ Recording apps/server/recording/ │
│ POST /events/ GET /sessions/ │
│ idempotent on event.id │
└──────────────────────────────────────────┘
Recording
Every input that should ever influence Clone's behavior — a desktop frame, a terminal turn, an integration webhook, an agent prompt — lands here as a CloneEvent keyed by id. apps/server/recording/views.py validates the payload against packages/schema/events.schema.json (the canonical wire-shape definition) and persists rows under a RecordingSession. Re-posting an event with an existing id is a no-op, so producers can retry freely.
Memory
apps/server/memories/ distills the Recording stream into a four-tier store:
| Tier | Model | What it holds |
|---|---|---|
| Profile | UserProfile | A singleton "who is this person" body that reads like a system prompt. |
| Semantic facts | SemanticMemory | Atomic, reusable facts (kind, text, importance, tags). |
| Episodic | EpisodicMemory | Time-bounded summaries of related raw events. |
| Raw | RawMemory | One row per Recording event, normalized to a single sentence. |
Promotion (raw → episodes → facts) is LLM-driven and runs in memories/promotion.py. Reads cluster around POST /api/memory/context/, which assembles a single bundle the Prediction layer can hand to the LLM.
Prediction
apps/server/predictions/views.py is the public face. POST /api/predictions/predict/ takes the agent's prompt, calls Memory in-process for a context bundle, sends a prediction-shaped prompt to Anthropic, applies per-user Platt scaling to the LLM's raw confidence, and returns the result with calibrated probabilities. The auto/escalated decision is made server-side by comparing the calibrated top candidate against the caller's threshold. Every call is persisted as a Prediction row (raw confidence kept for the fitter), so feedback (POST /predictions/<id>/feedback/) can later mark it accepted, edited, or rejected and feed both the calibration training set and the behavioral decay loop on cited facts.
/predict/batch/ shares one Memory bundle across up to 100 prompts; both endpoints return calibrated confidence plus raw_confidence.
Calibration (the daily loop)
apps/server/predictions/calibration.py fits a sigmoid σ(a·x + b) per user via Newton-Raphson on the (raw_confidence, label) pairs from labeled Prediction rows — accepted = positive, rejected = negative, edited = half-positive. Pure Python, no numpy/scipy. The daily fit_calibration management command writes the result to UserProfile.calibration (a JSON column); the predict view reads it on every call. Recency cap: 5000 most-recent labels per fit. Below MIN_SAMPLES_FOR_FIT = 5, the user gets identity calibration (raw == calibrated).
Decay (the silent garbage collector)
apps/server/memories/decay.py adjusts SemanticMemory.importance whenever a prediction's outcome contradicts or confirms the facts it cited. Asymmetric: +0.04 per confirmation, -0.08 per contradiction. The transition is atomic (select_for_update inside transaction.atomic) and flip-aware — flipping accepted → rejected reverses the prior +0.04 and applies -0.08. Facts that decay to zero importance are not deleted (provenance preserved); they sort to the bottom of every future context bundle.
Lifecycle of one prediction
- Client POSTs
{ agent, agent_input, threshold, k }to/api/predictions/predict/. predict_viewcalls_build_contextto fetch profile + 20 most-important facts + last 10 episodes + last 50 raw rows for that user.predictions/llm.pycallsclient.messages.create(...)with the assembled context and a system prompt that returns top-K candidates with reasoning and confidence.- The view writes a
Predictionrow, marksstatus='auto'iftop.confidence >= thresholdelse'escalated', and returns the JSON to the client. - (Optional) The client calls
/predictions/<id>/feedback/later with the user's actual reply for evaluation.
Cross-stack contracts
- Schema:
packages/schema/events.schema.jsonis the SSOT. Both TypeScript clients (packages/schema/events.ts) and the Django server (loaded once at import time inrecording/views.py) validate against the same JSON Schema. See Schema. - Auth: every authenticated request carries either
Authorization: Bearer <jwt>(60-min access token, refreshed every 50 min) orX-Clone-API-Key: clone_…(long-lived API key). Both are accepted by every endpoint listed above. See Authentication. - Anthropic SDK: the Prediction and Memory promotion paths are the only places the server talks to Anthropic. Both share the same
SUPPORTED_MODELSallow-list and the same key-missing/429/401error mapping.