JetBrains AI Assistant is a multimodel powerhouse, blending in-house gems with third-party titans. As of March 29, 2025, here's who's in the lineup:
- Mellum: JetBrains' own code-completion ninja—small, fast, and laser-focused on languages like Java, Kotlin, Python, and Go (up to 32k tokens).
- Claude 3.5 Sonnet: Anthropic's reasoning rockstar—128k-token context, perfect for deep code analysis or tricky refactors.
- GPT-4o: OpenAI's versatile genius—128k tokens, great for creative coding and chatty explanations.
- Google Gemini: Low-latency champ—optimized for quick UI tweaks or lightweight tasks (context varies, typically 32k-64k).
- Local Models: Your custom crew—connect via Ollama or LM Studio for offline glory.
Scope these out in Settings > Tools > AI Assistant (Cmd + , on macOS, Ctrl + , on Windows/Linux). You'll see a model selector—your command center for picking and tweaking.
Dev Hack: Mellum's cloud completion needs an AI Pro sub ($20/month), but local completion's free with any paid IDE license. Check Settings > Subscription to confirm your firepower.
JetBrains AI shines in three zones: Chat, Inline Completion, and Full-Line Completion. Let's deploy the right model for each gig.
Chat: Your AI Code Confidant
- Where: Right toolbar > AI Assistant pane (or Alt + A).
- How: Dropdown above the chat input—pick Mellum, Claude, or GPT-4o.
- Playbook:
- Mellum: Quick Q&A—"What's this Kotlin function do?"—low-latency wins.
- Claude 3.5 Sonnet: Debugging a hairy Spring Boot mess? "Explain this stack trace"—it'll reason through layers like a pro.
- GPT-4o: Brainstorming a REST API? "Draft an endpoint with auth"—it's got swagger and creativity.
- Tech Bit: Chat pulls context from open files or selected code—highlight a block, hit "Explain this" (right-click > AI Actions). Each message burns a prompt credit (500/month on AI Pro).
Inline Completion: Code on the Fly
- Where: Type, hit Ctrl + Space (or custom key in Settings > Keymap > AI Actions).
- How: Mellum's the default for cloud, local models for offline—set in Settings > AI Assistant > Completion Model.
- Playbook:
- Mellum: "Add a loop here"—instant suggestions, 40% acceptance rate per JetBrains' stats.
- Gemini: UI tweaks—"Style this div with Tailwind"—fast and snappy.
- Local (e.g., Llama): Offline grinding—"Finish this Python script"—no cloud, no cost.
- Tech Bit: Inline uses your file's AST—context-aware, token-light. Toggle "Enable Cloud Completion" in Settings for Mellum's server-side punch.
Full-Line Completion: Predictive Power
- Where: Type, accept suggestions with Tab—runs automatically.
- How: Local by default (free), Mellum boosts it with AI Pro.
- Playbook:
- Local Model: Boilerplate—"def fetch_data"—multi-line magic for Java, Python, etc.
- Mellum: Smarter blocks—"Create a React component"—cloud-synced, context-rich.
- Tech Bit: Local runs on-device (no prompt credits), verified for correctness—20% keystroke savings per JetBrains' blog. Cloud Mellum needs internet—latency's 1/3rd of older models.
AI Pro's 500 prompts/month (chat + cloud completion) sound generous, but they vanish fast if you're sloppy. Here's how to keep 'em flowing:
Lean on Local Completion
- Why: Free, unlimited—your IDE's built-in gift.
- How: Settings > AI Assistant > "Use Local Completion Only"—perfect for:
- Quick fixes: "Add error handling."
- Boilerplate: "Write a class stub."
- Tech Bit: Local models (tuned for Java, Python, etc.) run on your CPU—8GB RAM minimum, GPU optional. No prompt credits burned.
Batch Your Chats
- Why: Each message = 1 prompt—don't nickel-and-dime it.
- How: Bundle asks:
- Instead of:
- Chat: "Write a fetch call."
- Chat: "Add headers."
- Do:
- Chat: "Write a fetch call with auth headers."
- Instead of:
- Tech Bit: One API hit, one credit—tokens scale (e.g., 500 vs. 1k), but you're saving 50%+ on prompts.
Context Precision
- Why: Big context = big token burn—keep it tight.
- How: Use UI context manager (Chat pane > "Manage Context"):
- Add: Open file auto-included, drag others in.
- Remove: Ditch irrelevant files—e.g., skip node_modules.
- Tech Bit: Context caps at model limits (128k for Claude)—trim to 2-4k tokens for efficiency.
JetBrains lets you plug in your own LLMs—privacy freaks and offline warriors, this one's for you.
Hook Up Local Models
- How:
- Install Ollama (ollama.ai) or LM Studio—grab Llama 3.1 8B (4GB RAM minimum).
- Start the server: ollama serve or lms server start.
- Settings > AI Assistant > Providers > Add:
- Provider: "Ollama" or "LM Studio."
- URL: http://localhost:11434/v1 (Ollama default).
- Model: llama3.1:8b.
- Test in Chat—"Write a Go struct"—local magic, no cloud.
- Why: Unlimited prompts, zero credits—your hardware's the limit.
- Tech Bit: GPU (NVIDIA CUDA) cuts latency—e.g., 300ms vs. 1s on CPU. Syncs with IDE's AST for context.
Prompt Engineering
- How: Chat > "Prompt Library" > Add:
- "You're a TypeScript guru—use functional style, no console.logs."
- Why: Shapes every response—saves follow-ups.
- Tech Bit: Prepends to prompts—stored client-side, no extra API calls.
You're in control—now make it sing:
Monitor Usage
- How: Settings > AI Assistant > Usage—track your 500 prompts.
- Why: At 300 by day 15? Pivot to local or batch harder.
- Tech Bit: Logs (Help > Show Log in Explorer) show token counts—debug bloat.
Custom Keymaps
- How: Settings > Keymap > AI Actions:
- "Generate Code": Ctrl + G.
- "Explain Selection": Alt + E.
- Tech Bit: Muscle memory = fewer clicks—syncs across IDEs via JetBrains Account.
Project-Wide Rules
- How: Chat > Context > "Add Instruction":
- "Use PEP 8 for Python, avoid globals."
- Tech Bit: Applies to all prompts—consistent output, fewer fixes.
Boom—you've tamed JetBrains' AI beast! Mellum's your speed demon, Claude's your sage, GPT-4o's your muse, and local models are your secret weapon. Batch your prompts, trim your context, and hack in custom LLMs to keep coding like a rockstar.