Skip to main content

Architecture

Clone is a single Django service (apps/server) split into three layers. Every layer is owned by one Django app, persists its own tables under the user boundary, and exposes a small REST surface. Client surfaces (apps/web, apps/desktop, apps/cli, apps/mcp) talk to these REST endpoints over JWT or API-key auth — they hold no business logic of their own.

The three layers

┌──────────────────────────────────────────┐
│ Prediction apps/server/predictions/ │
│ POST /predict/ POST /predict/batch/ │
│ GET / /<id>/feedback/ │
│ GET /stats/ │
└────────────▲─────────────────────────────┘
│ context bundle (in-process)
┌────────────┴─────────────────────────────┐
│ Memory apps/server/memories/ │
│ profile · facts · episodes · raw │
│ POST /context/ POST /sync/ │
│ POST /promote/episodes/ /facts/ │
└────────────▲─────────────────────────────┘
│ derived from
┌────────────┴─────────────────────────────┐
│ Recording apps/server/recording/ │
│ POST /events/ GET /sessions/ │
│ idempotent on event.id │
└──────────────────────────────────────────┘

Recording

Every input that should ever influence Clone's behavior — a desktop frame, a terminal turn, an integration webhook, an agent prompt — lands here as a CloneEvent keyed by id. apps/server/recording/views.py validates the payload against packages/schema/events.schema.json (the canonical wire-shape definition) and persists rows under a RecordingSession. Re-posting an event with an existing id is a no-op, so producers can retry freely.

Memory

apps/server/memories/ distills the Recording stream into a four-tier store:

TierModelWhat it holds
ProfileUserProfileA singleton "who is this person" body that reads like a system prompt.
Semantic factsSemanticMemoryAtomic, reusable facts (kind, text, importance, tags).
EpisodicEpisodicMemoryTime-bounded summaries of related raw events.
RawRawMemoryOne row per Recording event, normalized to a single sentence.

Promotion (raw → episodes → facts) is LLM-driven and runs in memories/promotion.py. Reads cluster around POST /api/memory/context/, which assembles a single bundle the Prediction layer can hand to the LLM.

Prediction

apps/server/predictions/views.py is the public face. POST /api/predictions/predict/ takes the agent's prompt, calls Memory in-process for a context bundle, sends a prediction-shaped prompt to Anthropic, applies per-user Platt scaling to the LLM's raw confidence, and returns the result with calibrated probabilities. The auto/escalated decision is made server-side by comparing the calibrated top candidate against the caller's threshold. Every call is persisted as a Prediction row (raw confidence kept for the fitter), so feedback (POST /predictions/<id>/feedback/) can later mark it accepted, edited, or rejected and feed both the calibration training set and the behavioral decay loop on cited facts.

/predict/batch/ shares one Memory bundle across up to 100 prompts; both endpoints return calibrated confidence plus raw_confidence.

Calibration (the daily loop)

apps/server/predictions/calibration.py fits a sigmoid σ(a·x + b) per user via Newton-Raphson on the (raw_confidence, label) pairs from labeled Prediction rows — accepted = positive, rejected = negative, edited = half-positive. Pure Python, no numpy/scipy. The daily fit_calibration management command writes the result to UserProfile.calibration (a JSON column); the predict view reads it on every call. Recency cap: 5000 most-recent labels per fit. Below MIN_SAMPLES_FOR_FIT = 5, the user gets identity calibration (raw == calibrated).

Decay (the silent garbage collector)

apps/server/memories/decay.py adjusts SemanticMemory.importance whenever a prediction's outcome contradicts or confirms the facts it cited. Asymmetric: +0.04 per confirmation, -0.08 per contradiction. The transition is atomic (select_for_update inside transaction.atomic) and flip-aware — flipping accepted → rejected reverses the prior +0.04 and applies -0.08. Facts that decay to zero importance are not deleted (provenance preserved); they sort to the bottom of every future context bundle.

Lifecycle of one prediction

  1. Client POSTs { agent, agent_input, threshold, k } to /api/predictions/predict/.
  2. predict_view calls _build_context to fetch profile + 20 most-important facts + last 10 episodes + last 50 raw rows for that user.
  3. predictions/llm.py calls client.messages.create(...) with the assembled context and a system prompt that returns top-K candidates with reasoning and confidence.
  4. The view writes a Prediction row, marks status='auto' if top.confidence >= threshold else 'escalated', and returns the JSON to the client.
  5. (Optional) The client calls /predictions/<id>/feedback/ later with the user's actual reply for evaluation.

Cross-stack contracts

  • Schema: packages/schema/events.schema.json is the SSOT. Both TypeScript clients (packages/schema/events.ts) and the Django server (loaded once at import time in recording/views.py) validate against the same JSON Schema. See Schema.
  • Auth: every authenticated request carries either Authorization: Bearer <jwt> (60-min access token, refreshed every 50 min) or X-Clone-API-Key: clone_… (long-lived API key). Both are accepted by every endpoint listed above. See Authentication.
  • Anthropic SDK: the Prediction and Memory promotion paths are the only places the server talks to Anthropic. Both share the same SUPPORTED_MODELS allow-list and the same key-missing/429/401 error mapping.