Architecture

Clone is a single Django service (apps/server) split into three layers. Every layer is owned by one Django app, persists its own tables under the user boundary, and exposes a small REST surface. Client surfaces (apps/web, apps/desktop, apps/cli, apps/mcp) talk to these REST endpoints over JWT or API-key auth — they hold no business logic of their own.

The three layers

             ┌──────────────────────────────────────────┐
             │  Prediction   apps/server/predictions/   │
             │   POST /predict/   POST /predict/batch/  │
             │   GET  /          /<id>/feedback/        │
             │   GET  /stats/                           │
             └────────────▲─────────────────────────────┘
                          │ context bundle (in-process)
             ┌────────────┴─────────────────────────────┐
             │  Memory       apps/server/memories/      │
             │   profile · facts · episodes · raw       │
             │   POST /context/   POST /sync/           │
             │   POST /promote/episodes/  /facts/       │
             └────────────▲─────────────────────────────┘
                          │ derived from
             ┌────────────┴─────────────────────────────┐
             │  Recording    apps/server/recording/     │
             │   POST /events/    GET /sessions/        │
             │   idempotent on event.id                 │
             └──────────────────────────────────────────┘

Recording

Every input that should ever influence Clone's behavior — a desktop frame, a terminal turn, an integration webhook, an agent prompt — lands here as a CloneEvent keyed by id. apps/server/recording/views.py validates the payload against packages/schema/events.schema.json (the canonical wire-shape definition) and persists rows under a RecordingSession. Re-posting an event with an existing id is a no-op, so producers can retry freely.

Memory

apps/server/memories/ distills the Recording stream into a four-tier store:

Tier	Model	What it holds
Profile	`UserProfile`	A singleton "who is this person" body that reads like a system prompt.
Semantic facts	`SemanticMemory`	Atomic, reusable facts (`kind`, `text`, `importance`, `tags`).
Episodic	`EpisodicMemory`	Time-bounded summaries of related raw events.
Raw	`RawMemory`	One row per Recording event, normalized to a single sentence.

Promotion (raw → episodes → facts) is LLM-driven and runs in memories/promotion.py. Reads cluster around POST /api/memory/context/, which assembles a single bundle the Prediction layer can hand to the LLM.

Prediction

apps/server/predictions/views.py is the public face. POST /api/predictions/predict/ takes the agent's prompt, calls Memory in-process for a context bundle, sends a prediction-shaped prompt to Anthropic, applies per-user Platt scaling to the LLM's raw confidence, and returns the result with calibrated probabilities. The auto/escalated decision is made server-side by comparing the calibrated top candidate against the caller's threshold. Every call is persisted as a Prediction row (raw confidence kept for the fitter), so feedback (POST /predictions/<id>/feedback/) can later mark it accepted, edited, or rejected and feed both the calibration training set and the behavioral decay loop on cited facts.

/predict/batch/ shares one Memory bundle across up to 100 prompts; both endpoints return calibrated confidence plus raw_confidence.

Calibration (the daily loop)

apps/server/predictions/calibration.py fits a sigmoid σ(a·x + b) per user via Newton-Raphson on the (raw_confidence, label) pairs from labeled Prediction rows — accepted = positive, rejected = negative, edited = half-positive. Pure Python, no numpy/scipy. The daily fit_calibration management command writes the result to UserProfile.calibration (a JSON column); the predict view reads it on every call. Recency cap: 5000 most-recent labels per fit. Below MIN_SAMPLES_FOR_FIT = 5, the user gets identity calibration (raw == calibrated).

Decay (the silent garbage collector)

apps/server/memories/decay.py adjusts SemanticMemory.importance whenever a prediction's outcome contradicts or confirms the facts it cited. Asymmetric: +0.04 per confirmation, -0.08 per contradiction. The transition is atomic (select_for_update inside transaction.atomic) and flip-aware — flipping accepted → rejected reverses the prior +0.04 and applies -0.08. Facts that decay to zero importance are not deleted (provenance preserved); they sort to the bottom of every future context bundle.

Lifecycle of one prediction

Client POSTs { agent, agent_input, threshold, k } to /api/predictions/predict/.
predict_view calls _build_context to fetch profile + 20 most-important facts + last 10 episodes + last 50 raw rows for that user.
predictions/llm.py calls client.messages.create(...) with the assembled context and a system prompt that returns top-K candidates with reasoning and confidence.
The view writes a Prediction row, marks status='auto' if top.confidence >= threshold else 'escalated', and returns the JSON to the client.
(Optional) The client calls /predictions/<id>/feedback/ later with the user's actual reply for evaluation.

Cross-stack contracts

Schema: packages/schema/events.schema.json is the SSOT. Both TypeScript clients (packages/schema/events.ts) and the Django server (loaded once at import time in recording/views.py) validate against the same JSON Schema. See Schema.
Auth: every authenticated request carries either Authorization: Bearer <jwt> (60-min access token, refreshed every 50 min) or X-Clone-API-Key: clone_… (long-lived API key). Both are accepted by every endpoint listed above. See Authentication.
Anthropic SDK: the Prediction and Memory promotion paths are the only places the server talks to Anthropic. Both share the same SUPPORTED_MODELS allow-list and the same key-missing/429/401 error mapping.

The three layers​

Recording​

Memory​

Prediction​

Calibration (the daily loop)​

Decay (the silent garbage collector)​

Lifecycle of one prediction​

Cross-stack contracts​