Cursor AI's model lineup is like a toolbox—each one's got a vibe and a purpose. Here's the rundown:
- Claude 3.5 Sonnet: The thoughtful coder's buddy. Great for nuanced tasks like refactoring or grokking tricky logic. High precision, solid context (128k tokens).
- GPT-4o: The brainiac all-rounder. Smarter than your average bear, it's perfect for complex generation or when you need a bit of everything. Also 128k tokens.
- cursor-small: The speedy lightweight. Unlimited and free, it's your go-to for quick fixes or basic completions. Think 8k tokens—short and sweet.
- MAX Mode Models: Beasts with 200k-token context windows. Use these when you're wrestling with a massive codebase or a novel-length file.
You'll find these in Settings > Models (hit Cmd + , on macOS or Ctrl + , on Windows/Linux). Pop in there, and you'll see a list—some checked (enabled), some not. Enable what you need, but don't overdo it; too many options can clutter your dropdowns.
Pro Tip: Hover over a model in the list. Cursor sometimes drops hints about token limits or strengths—handy for picking the right tool.
Cursor's got three main battlegrounds where models shine: Chat, Composer (Cmd + K), and Tab Autocomplete. Here's how to match them up:
Chat: Your AI Pair Programmer
- Where: Top-right Chat pane.
- How: Click the model dropdown above the input box.
- Tactic:
- Need to debug a hairy Python mess? Go Claude 3.5 Sonnet—it's got a knack for reasoning through code.
- Writing a spec from scratch? GPT-4o's got the creative juice.
- Just want a regex explained? cursor-small won't judge your laziness.
- Tech Bit: Each message is a request. Attach a file with @file (e.g., @main.py), and it'll slurp up context—but big files chew more tokens, so keep it tight.
Composer (Cmd + K): Inline Code Wizardry
- Where: Highlight code, hit Cmd + K, type your ask.
- How: Model dropdown's right there in the Composer box.
- Tactic:
- Quick comment tweak? cursor-small nails it in milliseconds.
- Rewriting a class with TypeScript interfaces? GPT-4o's your architect.
- Huge refactor across files? MAX mode's got the stamina (but watch that request count).
- Tech Bit: Highlighted code sets the context. Smaller selections = fewer tokens. If Composer's guessing wrong, add a prompt like "focus on this function only."
Tab Autocomplete: The Silent Helper
- Where: As you type, suggestions pop up.
- How: No direct model picker—Cursor uses fast, lightweight models by default (often cursor-small).
- Tactic: Let it ride for mundane stuff (e.g., closing brackets, boilerplate). Disable it in Settings > Features if it's overzealous and save premium models for bigger lifts.
- Tech Bit: Autocomplete's token usage is tiny, so it rarely dents your limit—perfect for free-tier grinding.
You're on the $20/month Pro plan, right? That's 500 premium requests—Chat messages, Composer runs, Agent mode ops—before Cursor slows you down (unless you flip on usage-based pricing). Let's keep you under that cap with ninja-level efficiency.
Batch Like a Boss
- Why: Every Cmd + K or Chat send is one request.
- How: Lump your asks together.
- Instead of:
- Cmd + K: "Add a loop"
- Cmd + K: "Make it print numbers"
- Do:
- Cmd + K: "Add a loop that prints numbers 1-10"
- In Chat: "Write a REST API in Flask, add error handling, and explain the middleware" = 1 request vs. 3.
- Instead of:
- Tech Bit: Fewer requests mean fewer API calls. Token count per request might nudge up, but it's cheaper than splitting hairs.
Lean on cursor-small
- Why: Unlimited and free, it's your workhorse.
- How: Default to it in Chat or Composer for:
- Syntax fixes ("Fix this JSON").
- One-liners ("Write a list comprehension").
- Quick docs ("What's os.path.join?").
- Tech Bit: At 8k tokens, it's got enough juice for most daily grinds. Switch to premium only when it stumbles (e.g., "it's hallucinating—time for GPT-4o").
Reset Chat Context
- Why: Long Chat threads balloon token usage, making premium requests pricier.
- How: Hit "New Chat" for unrelated tasks. Example:
- Debugged a bug? New session for that feature spec.
- Pin key files with @codebase or @file to keep context lean.
- Tech Bit: Token limits (e.g., 128k for Claude) reset per session. Old context lingers otherwise, bloating the payload.
Agent Mode Smarts
- Why: Agent mode (planning, tool calls) can rack up requests fast—each tool op counts.
- How: Use it sparingly:
- Plan a feature with Claude 3.7 Sonnet Thinking (reasoning beast), then code manually or with cursor-small.
- Avoid overkill like "Agent, format my README"—do that yourself.
- Tech Bit: Tool calls (e.g., edit_file) trigger separate API hits. Check Settings > Usage to see the damage.
Want to sidestep the 500-request limit entirely? Hook up your own models. It's a bit of a hack, but oh-so-satisfying.
Third-Party APIs
- How:
- Grab an API key (e.g., OpenAI, Anthropic).
- Settings > Models > "Add Model."
- Fill it:
- Name: "MyGPT"
- API Key: sk-...
- Base URL: https://api.openai.com/v1
- Model: gpt-4-turbo
- Context: 16384 (or whatever your model supports).
- Test it in Chat—boom, your own beast.
- Tech Bit: Requests hit your provider's quota, not Cursor's. Costs vary (e.g., OpenAI's $0.01/1k tokens), so monitor their dashboard.
Local LLMs
- How:
- Spin up a local server with LMStudio or Ollama (e.g., LLaMA 13B).
- Expose it via Ngrok: ngrok http 8000 → http://abc123.ngrok.io.
- In Cursor, override OpenAI API settings:
- Base URL: http://abc123.ngrok.io/v1
- API Key: Dummy value (e.g., local).
- Model: Whatever your local setup calls it (e.g., llama-13b).
- Chat away—free and limitless.
- Tech Bit: Local models need beefy hardware (16GB RAM minimum, GPU preferred). Latency's higher, but no cloud costs.
Let's get fancy with some developer-grade tweaks:
Project Rules
- How: Drop a .cursorrules file in your repo root.
- Example:
You're a TypeScript ninja. Write concise, functional code. Avoid console.logs unless I ask.
- Why: Tailors model output, saving follow-up requests like "make it shorter."
Monitor Usage
- How: Settings > Billing shows your 500-request tally.
- Tactic: Aim for 15/day (450/month) to leave a buffer. At 300 by day 15? Pivot to cursor-small or custom models.
MAX Mode Judiciously
- Why: 200k tokens are clutch for huge projects but guzzle resources.
- How: Use it only when @codebase can't cut it (e.g., parsing a 10k-line monolith).
You've got the keys to Cursor's model kingdom. Mix cursor-small for the grind, premium models for the glory, and custom setups for total freedom. Batch your asks, reset contexts, and keep an eye on that 500-request gauge—you'll be coding like a rockstar without hitting the slow lane.