Platform Guide

Models & Pricing

What the :free suffix means, how usage is priced and why a model sometimes disappears.

The catalog

The Models page lists every model with context window, per-token prices, supported endpoints and capability filters like tool calling and image input. Each model has its own page with copy-paste code examples. Not every free model supports tool calls or vision, so check the capability badges before wiring one into a coding agent.

Rankings shows which free models actually deliver, based on community test results. Status tracks live provider health.

Free vs paid models

Free models carry a :free suffix, for example gpt-oss-120b:free

A :free model routes only to free upstream providers and never touches your balance. The same base name without the suffix is the paid version: stable, uncapped and billed per token. Both can exist side by side, so switching from free to paid is a one-string change.

Free models are free for a reason: the upstream providers set the rate limits, not us. Expect 429 responses at peak times, and use a paid model when you need reliability.

How pricing works

Most models bill per token, with separate input and output prices. A few models (mostly image and video) bill a flat price per call instead. What you see on the model page is what you pay: no subscriptions, no hidden fees, your balance simply decreases per request.

The Pricing page has current top-up options; every model page shows live per-token prices.

Prompt caching

For models that support prompt caching (Claude and others), repeated prompt prefixes are billed at a reduced cached-input rate, while writing a new cache entry costs slightly more than a normal input token (about 1.25x).

Caching is automatic. Workloads with long stable system prompts (agents, RP presets) profit the most, with no configuration needed.

Availability & failover

When a free model has several upstream providers, requests automatically fail over to the next one if a provider hits its rate limit. Single-provider models cannot fail over, so they stall until the limit resets.

Each model runs across one or more provider channels. A channel that hits its rate limit is temporarily disabled and hidden, it is not shown as an option while it recovers. As long as one channel is still up, the model keeps working through it. Only when every channel for a model is rate limited does the model itself disappear from the catalog.

A model vanishing from the list is expected under load, not a typo or an outage. It reappears on its own once a channel passes a health check, usually within minutes. If a saved model name stops resolving, check the Models page: it is either recovering or has been renamed.