Managing AI Models in Cursor Pro

Optimize your Cursor Pro usage and maximize your 500 monthly requests

Back to Model Use

What to Use: Cursor provides unlimited access to lightweight models like cursor-small, optimized for quick completions and basic tasks.

How to Apply:

  • In Chat, select cursor-small from the model dropdown for simple queries (e.g., "Explain this function" or "Write a short loop").
  • For Tab Autocomplete or small inline edits via Cmd + K, lightweight models are often used by default—stick to these for repetitive, low-complexity work.

Savings: Reserves premium requests for complex tasks like debugging large codebases or generating intricate solutions.

Why It Helps: Each Chat message or Cmd + K action counts as a request. Sending multiple small prompts quickly eats into your limit.

How to Apply:

  • Combine related questions into a single prompt. For example, instead of:
    • "Write a Python function to sort a list."
    • "Now add error handling."
    • "Make it recursive."
    Use: "Write a recursive Python function to sort a list with error handling."
  • In Chat, ask multi-part questions in one go: "Explain X, then show an example, and suggest optimizations."

Savings: Reduces 3–5 requests to 1, potentially saving 10–20% of your limit over a month.

Why It Matters: Cursor maintains context within a Chat session, and long conversations with premium models can inflate token usage, indirectly increasing request costs.

How to Apply:

  • Start a new Chat session (via the "New Chat" button) for unrelated tasks to reset the context window.
  • Example: Finish debugging a function, then start a fresh session for a new feature rather than continuing in the same thread.
  • Avoid attaching massive files or entire codebases unless necessary—use smaller, relevant snippets instead.

Savings: Keeps premium model usage lean, avoiding unnecessary token-heavy requests.

Why It's Costly: Agent mode (e.g., for planning or tool calls) often uses reasoning models like Claude 3.7 Sonnet Thinking, and each tool operation (e.g., file edits, searches) may count as a separate request.

How to Apply:

  • Use Agent mode only for high-value tasks, like planning a complex feature or automating multi-step refactors.
  • Pre-plan manually when possible (e.g., outline your code structure yourself) before invoking the Agent.
  • Switch to cursor-small or manual coding for execution after the planning phase.

Savings: Cuts down on multi-request operations, potentially saving 50–100 requests if you lean on it heavily.

Why It Adds Up: Each Cmd + K (inline code generation) action counts as a request when using premium models.

How to Apply:

  • Use cursor-small for quick edits (e.g., "Add a comment here" or "Fix this syntax").
  • Reserve premium models for significant refactors or when precision is critical (e.g., "Rewrite this class in TypeScript with better error handling").
  • Highlight specific code sections rather than relying on Cursor to infer context from large files, reducing token overhead.

Savings: Shifts 20–30% of inline edits to free models, preserving premium requests.

How to Track:

  • Check your usage in Settings > Billing (or a similar section in Cursor). It shows how many of your 500 requests remain.
  • Estimate daily usage: 500 requests ÷ 30 days ≈ 16–17 requests/day. Adjust habits if you're exceeding this early in the month.

How to Apply:

  • Set a personal daily cap (e.g., 10–15 requests) and stick to free models once you hit it.
  • Disable premium models temporarily in Settings > Models if you're nearing the limit mid-month, forcing reliance on free options.

Savings: Prevents unexpected depletion, giving you control over the pace.

Why It Works: Custom models via your own API keys (e.g., OpenAI, Anthropic, or local LLMs) don't count toward the 500-request limit—they're billed separately by the provider or free if local.

How to Apply:

  • Set up a local LLM (e.g., via LMStudio or Ollama) and connect it to Cursor using a mock OpenAI API endpoint.
  • Add an OpenAI or Anthropic API key in Settings > Models for premium models outside Cursor's quota.
  • Example: Use a local LLaMA model for Chat, saving all 500 requests for Cursor-specific premium features.

Savings: Offloads unlimited tasks to external setups, potentially saving 100% of your limit.

Why It's Expensive: MAX mode models (e.g., 200k token context) are powerful for large projects but consume more resources, possibly counting as higher-cost requests.

How to Apply:

  • Use MAX mode only when working with massive codebases or long documents that exceed standard context limits (e.g., 128k tokens).
  • Break down tasks into smaller chunks and use regular models when possible.

Savings: Limits premium usage to essential cases, saving 10–20 requests on big projects.

  • Daily Goal: 15 requests/day × 30 days = 450 requests, leaving a 50-request buffer.
  • Breakdown:
    • 5 Chat queries (premium for complex tasks, free for simple ones).
    • 5 Cmd + K edits (mix of free and premium).
    • 3 Agent mode uses (reserved for planning or automation).
    • 2 miscellaneous (e.g., MAX mode or experimentation).
  • Adjustment: If you're at 300 requests by day 15, shift to free/custom models for the second half.
  • Prioritize Quality: Use premium requests where they add the most value (e.g., solving tricky bugs) rather than mundane tasks.
  • Experiment Early: Test usage patterns in the first week to gauge your needs, then refine your approach.
  • Fallback Plan: If you hit 500, enable usage-based pricing ($0.01–$0.05/request) temporarily, or rely on free models until the next cycle.