Memory & Context
Keeping long stories coherent past the context window.
Rolling summary
With memory enabled, the oldest chunk of a long chat is folded into a running story-so-far summary and the summarized messages are cut from the prompt. The story keeps its shape while token use stays flat.
Summarization runs on fast free models in the background; you keep chatting while it folds. The current summary rides along in a system block.
Semantic retrieval
Alongside the summary, recent chat is embedded and compared against older messages and lore candidates; the closest matches are injected as relevant background. Old details resurface exactly when the scene touches them.
Web search
Per-conversation opt-in. A tiny classifier first decides whether your message actually needs fresh facts; only then does a real search run and its results join the context.
Search engine and context size are configurable in the settings drawer. Web search is available on paid accounts.
Where the toggles live
Memory and web search are per-conversation switches in the settings drawer; presets carry defaults for new chats. The utility model setting picks which model does the folding and background work.