Search docs...

Start typing to search documentation

Chat Guide

Memory & Context

Keeping long stories coherent past the context window.

Rolling summary

With memory enabled, the oldest chunk of a long chat is folded into a running story-so-far summary and the summarized messages are cut from the prompt. The story keeps its shape while token use stays flat.

Summarization runs on fast free models in the background; you keep chatting while it folds. The current summary rides along in a system block.

Semantic retrieval

Alongside the summary, recent chat is embedded and compared against older messages and lore candidates; the closest matches are injected as relevant background. Old details resurface exactly when the scene touches them.

Per-conversation opt-in. A tiny classifier first decides whether your message actually needs fresh facts; only then does a real search run and its results join the context.

Search engine and context size are configurable in the settings drawer. Web search is available on paid accounts.

Where the toggles live

Memory and web search are per-conversation switches in the settings drawer; presets carry defaults for new chats. The utility model setting picks which model does the folding and background work.

Memory, Retrieval, and Web Search in UnoRouter