JetBrains Model Management

Master the art of managing AI models in JetBrains IDEs

Efficiency, Cost Control, and Precision

Back to Model Use

JetBrains AI Assistant is a multimodel powerhouse, blending in-house gems with third-party titans. As of March 29, 2025, here's who's in the lineup:

  • Mellum: JetBrains' own code-completion ninja—small, fast, and laser-focused on languages like Java, Kotlin, Python, and Go (up to 32k tokens).
  • Claude 3.5 Sonnet: Anthropic's reasoning rockstar—128k-token context, perfect for deep code analysis or tricky refactors.
  • GPT-4o: OpenAI's versatile genius—128k tokens, great for creative coding and chatty explanations.
  • Google Gemini: Low-latency champ—optimized for quick UI tweaks or lightweight tasks (context varies, typically 32k-64k).
  • Local Models: Your custom crew—connect via Ollama or LM Studio for offline glory.

Scope these out in Settings > Tools > AI Assistant (Cmd + , on macOS, Ctrl + , on Windows/Linux). You'll see a model selector—your command center for picking and tweaking.

Dev Hack: Mellum's cloud completion needs an AI Pro sub ($20/month), but local completion's free with any paid IDE license. Check Settings > Subscription to confirm your firepower.

JetBrains AI shines in three zones: Chat, Inline Completion, and Full-Line Completion. Let's deploy the right model for each gig.

Chat: Your AI Code Confidant

  • Where: Right toolbar > AI Assistant pane (or Alt + A).
  • How: Dropdown above the chat input—pick Mellum, Claude, or GPT-4o.
  • Playbook:
    • Mellum: Quick Q&A—"What's this Kotlin function do?"—low-latency wins.
    • Claude 3.5 Sonnet: Debugging a hairy Spring Boot mess? "Explain this stack trace"—it'll reason through layers like a pro.
    • GPT-4o: Brainstorming a REST API? "Draft an endpoint with auth"—it's got swagger and creativity.
  • Tech Bit: Chat pulls context from open files or selected code—highlight a block, hit "Explain this" (right-click > AI Actions). Each message burns a prompt credit (500/month on AI Pro).

Inline Completion: Code on the Fly

  • Where: Type, hit Ctrl + Space (or custom key in Settings > Keymap > AI Actions).
  • How: Mellum's the default for cloud, local models for offline—set in Settings > AI Assistant > Completion Model.
  • Playbook:
    • Mellum: "Add a loop here"—instant suggestions, 40% acceptance rate per JetBrains' stats.
    • Gemini: UI tweaks—"Style this div with Tailwind"—fast and snappy.
    • Local (e.g., Llama): Offline grinding—"Finish this Python script"—no cloud, no cost.
  • Tech Bit: Inline uses your file's AST—context-aware, token-light. Toggle "Enable Cloud Completion" in Settings for Mellum's server-side punch.

Full-Line Completion: Predictive Power

  • Where: Type, accept suggestions with Tab—runs automatically.
  • How: Local by default (free), Mellum boosts it with AI Pro.
  • Playbook:
    • Local Model: Boilerplate—"def fetch_data"—multi-line magic for Java, Python, etc.
    • Mellum: Smarter blocks—"Create a React component"—cloud-synced, context-rich.
  • Tech Bit: Local runs on-device (no prompt credits), verified for correctness—20% keystroke savings per JetBrains' blog. Cloud Mellum needs internet—latency's 1/3rd of older models.

AI Pro's 500 prompts/month (chat + cloud completion) sound generous, but they vanish fast if you're sloppy. Here's how to keep 'em flowing:

Lean on Local Completion

  • Why: Free, unlimited—your IDE's built-in gift.
  • How: Settings > AI Assistant > "Use Local Completion Only"—perfect for:
    • Quick fixes: "Add error handling."
    • Boilerplate: "Write a class stub."
  • Tech Bit: Local models (tuned for Java, Python, etc.) run on your CPU—8GB RAM minimum, GPU optional. No prompt credits burned.

Batch Your Chats

  • Why: Each message = 1 prompt—don't nickel-and-dime it.
  • How: Bundle asks:
    • Instead of:
      • Chat: "Write a fetch call."
      • Chat: "Add headers."
    • Do:
      • Chat: "Write a fetch call with auth headers."
  • Tech Bit: One API hit, one credit—tokens scale (e.g., 500 vs. 1k), but you're saving 50%+ on prompts.

Context Precision

  • Why: Big context = big token burn—keep it tight.
  • How: Use UI context manager (Chat pane > "Manage Context"):
    • Add: Open file auto-included, drag others in.
    • Remove: Ditch irrelevant files—e.g., skip node_modules.
  • Tech Bit: Context caps at model limits (128k for Claude)—trim to 2-4k tokens for efficiency.

JetBrains lets you plug in your own LLMs—privacy freaks and offline warriors, this one's for you.

Hook Up Local Models

  • How:
    1. Install Ollama (ollama.ai) or LM Studio—grab Llama 3.1 8B (4GB RAM minimum).
    2. Start the server: ollama serve or lms server start.
    3. Settings > AI Assistant > Providers > Add:
      • Provider: "Ollama" or "LM Studio."
      • URL: http://localhost:11434/v1 (Ollama default).
      • Model: llama3.1:8b.
    4. Test in Chat—"Write a Go struct"—local magic, no cloud.
  • Why: Unlimited prompts, zero credits—your hardware's the limit.
  • Tech Bit: GPU (NVIDIA CUDA) cuts latency—e.g., 300ms vs. 1s on CPU. Syncs with IDE's AST for context.

Prompt Engineering

  • How: Chat > "Prompt Library" > Add:
    • "You're a TypeScript guru—use functional style, no console.logs."
  • Why: Shapes every response—saves follow-ups.
  • Tech Bit: Prepends to prompts—stored client-side, no extra API calls.

You're in control—now make it sing:

Monitor Usage

  • How: Settings > AI Assistant > Usage—track your 500 prompts.
  • Why: At 300 by day 15? Pivot to local or batch harder.
  • Tech Bit: Logs (Help > Show Log in Explorer) show token counts—debug bloat.

Custom Keymaps

  • How: Settings > Keymap > AI Actions:
    • "Generate Code": Ctrl + G.
    • "Explain Selection": Alt + E.
  • Tech Bit: Muscle memory = fewer clicks—syncs across IDEs via JetBrains Account.

Project-Wide Rules

  • How: Chat > Context > "Add Instruction":
    • "Use PEP 8 for Python, avoid globals."
  • Tech Bit: Applies to all prompts—consistent output, fewer fixes.

Boom—you've tamed JetBrains' AI beast! Mellum's your speed demon, Claude's your sage, GPT-4o's your muse, and local models are your secret weapon. Batch your prompts, trim your context, and hack in custom LLMs to keep coding like a rockstar.