Every local AI tool loads a model, answers, and forgets who it was. commonllama keeps weights resident and swaps contexts in milliseconds — so your AI remembers its identity without reloading the whole thing every time you switch tasks.
Models load into memory and stay there. No cold starts between conversations. No 30-second reload when you switch from writing to research. The weights are already resident — the response starts immediately.
Run a narrator, a fact-checker, and a classifier simultaneously on the same hardware. Each role keeps its own identity pinned in memory. When one role needs to think, the others don't lose their place.
Your application calls roles, not models. The same code runs on a Raspberry Pi with one shared model and a Mac Studio with three dedicated ones. No conditionals. No platform checks.
The AI's persona and system prompt are computed once and pinned. When the task changes, only the working context swaps — in place, in milliseconds. No model reload. No identity recomputation. The switch from "research mode" to "writing mode" is instant.
System prompt and persona. Computed once, never recomputed while the model stays loaded.
Current task context. Swaps when attention shifts. The expensive part happens once; switching is cheap.
Your messages, building up turn by turn. When it fills, the engine tells you — nothing is silently dropped.
Every local AI tool treats your models like strangers — load, answer, forget. commonllama keeps them warm and remembers who they are. Nobody else is building this piece, so I'm building it.
For developers building local AI applications who need more than a chat wrapper. MIT licensed — use it, fork it, embed it, ship products on top of it.