JetBrains Model Management - www.aideploy.dev

JetBrains AI Assistant is a multimodel powerhouse, blending in-house gems with third-party titans. As of March 29, 2025, here's who's in the lineup:

Mellum: JetBrains' own code-completion ninja—small, fast, and laser-focused on languages like Java, Kotlin, Python, and Go (up to 32k tokens).
Claude 3.5 Sonnet: Anthropic's reasoning rockstar—128k-token context, perfect for deep code analysis or tricky refactors.
GPT-4o: OpenAI's versatile genius—128k tokens, great for creative coding and chatty explanations.
Google Gemini: Low-latency champ—optimized for quick UI tweaks or lightweight tasks (context varies, typically 32k-64k).
Local Models: Your custom crew—connect via Ollama or LM Studio for offline glory.

Scope these out in Settings > Tools > AI Assistant (Cmd + , on macOS, Ctrl + , on Windows/Linux). You'll see a model selector—your command center for picking and tweaking.

Dev Hack: Mellum's cloud completion needs an AI Pro sub ($20/month), but local completion's free with any paid IDE license. Check Settings > Subscription to confirm your firepower.

JetBrains AI shines in three zones: Chat, Inline Completion, and Full-Line Completion. Let's deploy the right model for each gig.

Chat: Your AI Code Confidant

Where: Right toolbar > AI Assistant pane (or Alt + A).
How: Dropdown above the chat input—pick Mellum, Claude, or GPT-4o.
Playbook:
- Mellum: Quick Q&A—"What's this Kotlin function do?"—low-latency wins.
- Claude 3.5 Sonnet: Debugging a hairy Spring Boot mess? "Explain this stack trace"—it'll reason through layers like a pro.
- GPT-4o: Brainstorming a REST API? "Draft an endpoint with auth"—it's got swagger and creativity.
Tech Bit: Chat pulls context from open files or selected code—highlight a block, hit "Explain this" (right-click > AI Actions). Each message burns a prompt credit (500/month on AI Pro).

Inline Completion: Code on the Fly

Where: Type, hit Ctrl + Space (or custom key in Settings > Keymap > AI Actions).
How: Mellum's the default for cloud, local models for offline—set in Settings > AI Assistant > Completion Model.
Playbook:
- Mellum: "Add a loop here"—instant suggestions, 40% acceptance rate per JetBrains' stats.
- Gemini: UI tweaks—"Style this div with Tailwind"—fast and snappy.
- Local (e.g., Llama): Offline grinding—"Finish this Python script"—no cloud, no cost.
Tech Bit: Inline uses your file's AST—context-aware, token-light. Toggle "Enable Cloud Completion" in Settings for Mellum's server-side punch.

Full-Line Completion: Predictive Power

Where: Type, accept suggestions with Tab—runs automatically.
How: Local by default (free), Mellum boosts it with AI Pro.
Playbook:
- Local Model: Boilerplate—"def fetch_data"—multi-line magic for Java, Python, etc.
- Mellum: Smarter blocks—"Create a React component"—cloud-synced, context-rich.
Tech Bit: Local runs on-device (no prompt credits), verified for correctness—20% keystroke savings per JetBrains' blog. Cloud Mellum needs internet—latency's 1/3rd of older models.

AI Pro's 500 prompts/month (chat + cloud completion) sound generous, but they vanish fast if you're sloppy. Here's how to keep 'em flowing:

Lean on Local Completion

Why: Free, unlimited—your IDE's built-in gift.
How: Settings > AI Assistant > "Use Local Completion Only"—perfect for:
- Quick fixes: "Add error handling."
- Boilerplate: "Write a class stub."
Tech Bit: Local models (tuned for Java, Python, etc.) run on your CPU—8GB RAM minimum, GPU optional. No prompt credits burned.

Batch Your Chats

Why: Each message = 1 prompt—don't nickel-and-dime it.
How: Bundle asks:
- Instead of:
  - Chat: "Write a fetch call."
  - Chat: "Add headers."
- Do:
  - Chat: "Write a fetch call with auth headers."
Tech Bit: One API hit, one credit—tokens scale (e.g., 500 vs. 1k), but you're saving 50%+ on prompts.

Context Precision

Why: Big context = big token burn—keep it tight.
How: Use UI context manager (Chat pane > "Manage Context"):
- Add: Open file auto-included, drag others in.
- Remove: Ditch irrelevant files—e.g., skip node_modules.
Tech Bit: Context caps at model limits (128k for Claude)—trim to 2-4k tokens for efficiency.

JetBrains lets you plug in your own LLMs—privacy freaks and offline warriors, this one's for you.

Hook Up Local Models

How:
1. Install Ollama (ollama.ai) or LM Studio—grab Llama 3.1 8B (4GB RAM minimum).
2. Start the server: ollama serve or lms server start.
3. Settings > AI Assistant > Providers > Add:
  - Provider: "Ollama" or "LM Studio."
  - URL: http://localhost:11434/v1 (Ollama default).
  - Model: llama3.1:8b.
4. Test in Chat—"Write a Go struct"—local magic, no cloud.
Why: Unlimited prompts, zero credits—your hardware's the limit.
Tech Bit: GPU (NVIDIA CUDA) cuts latency—e.g., 300ms vs. 1s on CPU. Syncs with IDE's AST for context.

Prompt Engineering

How: Chat > "Prompt Library" > Add:
- "You're a TypeScript guru—use functional style, no console.logs."
Why: Shapes every response—saves follow-ups.
Tech Bit: Prepends to prompts—stored client-side, no extra API calls.

You're in control—now make it sing:

Monitor Usage

How: Settings > AI Assistant > Usage—track your 500 prompts.
Why: At 300 by day 15? Pivot to local or batch harder.
Tech Bit: Logs (Help > Show Log in Explorer) show token counts—debug bloat.

Custom Keymaps

How: Settings > Keymap > AI Actions:
- "Generate Code": Ctrl + G.
- "Explain Selection": Alt + E.
Tech Bit: Muscle memory = fewer clicks—syncs across IDEs via JetBrains Account.

Project-Wide Rules

How: Chat > Context > "Add Instruction":
- "Use PEP 8 for Python, avoid globals."
Tech Bit: Applies to all prompts—consistent output, fewer fixes.

Boom—you've tamed JetBrains' AI beast! Mellum's your speed demon, Claude's your sage, GPT-4o's your muse, and local models are your secret weapon. Batch your prompts, trim your context, and hack in custom LLMs to keep coding like a rockstar.