Navigation
CyberAdvisors Hub
Structure Copilot Guide Dev Planner Model Guide Solution Stack Agent Playbook $Agent Cost Calculator PII Sanitizer Agent Tracker Meeting Agenda Agent Intake
CyberAdvisors AI Toolkit · Internal Use
CyberAdvisors · Reference

AI Model Guide

Choose the right model for the right job — capabilities, cost and context window at a glance.

// Section 01 — Full Decision Matrix
Which Engine  For What ■ = best fit   ◆ = capable   ▲ = limited   · = not recommended
Task / Metric GPT-4.1
OpenAI
GPT-5
Chat
OpenAI
GPT-5
Auto
OpenAI
GPT-5
Reasoning
OpenAI
O4 mini
RL FT
OpenAI
Sonnet
4.5
Anthropic
Sonnet
4.6
Anthropic
Opus
4.5
Anthropic
Status Default Stable Preview Preview Exp. Stable Stable Exp.
 Quick Q&A ·
 Coding
 Long Writing ·
⬡ Deep Reasoning
∑ Data / Math
 Summarise Doc
Speed
Input $/1M $2.00 $10.00 Varies $15.00 TBD $3.00 $3.00 $15.00
Output $/1M $8.00 $30.00 Varies $60.00 TBD $15.00 $15.00 $75.00
Context Window 1M 1M 1M 1M 128K 200K 200K 200K
// Section 02 — Task-to-Model Quick Pick
Quick Q&A / Daily Chat
Use This
GPT-4.1 (Default)
Fast, cheap, and accurate for everyday questions. Already set as your default — don't overthink it.
Alt: Claude Sonnet 4.6 if you prefer Anthropic's tone
Code Generation
Use This
Claude Sonnet 4.6
Excellent instruction-following, reliable output structure, and solid at PowerShell, Python, and JavaScript. Less likely to pad with unnecessary code.
Alt: GPT-4.1 · Heavy logic: GPT-5 Reasoning
Long-Form Writing / Docs
Use This
Claude Sonnet 4.5 or 4.6
Best tone awareness, nuance, and structure for technical documentation, client reports, SOPs, and proposals. 200K context handles long drafts.
Premium: Claude Opus 4.5 for high-stakes client deliverables
Deep Reasoning / Analysis
Use This
GPT-5 Reasoning (Preview)
Internal "thinking" before responding. Best for multi-step logic, root cause analysis, complex troubleshooting trees, or policy interpretation.
Alt: Claude Opus 4.5 · Budget: RL FT O4 mini
Data / Math / STEM
Use This
RL FineTuned O4 mini
Reinforcement-learning tuned planner — strongest at structured planning, numerical reasoning, and constraint solving. Experimental but fast.
Alt: GPT-5 Reasoning · Claude Opus 4.5
Summarise a Long Document
Use This
GPT-5 Chat or Claude Sonnet 4.6
Both have 1M/200K context windows and handle long inputs without losing thread. GPT-5 Chat for OpenAI ecosystem; Sonnet for Anthropic's cleaner summaries.
Huge files (500K+ tokens): GPT-5 Chat (1M ctx wins)
Unknown / Mixed Task
Use This
GPT-5 Auto (Preview)
Automatically selects between chat and reasoning mode based on query complexity. Best when you don't know which mode your task needs.
Note: cost varies with selected mode
Highest Quality / No Budget Limit
Use This
Claude Opus 4.5
Experimental but top-tier. Best for final client deliverables, executive reports, complex problem solving. Use when quality matters more than cost.
Alt: GPT-5 Reasoning for math-heavy tasks
// Section 03 — Cost Metrics Per Model

Prices per 1 million tokens via API. Subscription/UI usage is absorbed in your plan. Output tokens cost more — keep responses concise to save budget.

OpenAI OpenAI Models
GPT-4.1
Default · Best all-rounder · 1M context
Input /1M$2.00
Output /1M$8.00
Context1,000,000
SpeedFast
StatusDefault ✓
// Best cost/performance ratio in the picker. Start here for anything routine.
GPT-5 Chat
Stable · General tasks · 1M context
Input /1M~$10.00
Output /1M~$30.00
Context1,000,000
SpeedModerate
StatusStable
// ~5× more expensive than GPT-4.1. Use when quality uplift justifies it.
GPT-5 Auto
Preview · Mode-switching · 1M context
Input /1MVaries*
Output /1MVaries*
Context1,000,000
SpeedVaries
StatusPreview
// *Routes to chat or reasoning mode. Cost unpredictable — avoid for budget-sensitive runs.
GPT-5 Reasoning
Preview · Max depth · 1M context
Input /1M~$15.00
Output /1M~$60.00
Context1,000,000
SpeedSlow
StatusPreview
// Thinking tokens are billed. Reserve for genuinely hard problems only.
RL FT O4 mini
Experimental · Planner · 128K context
Input /1MTBD
Output /1MTBD
Context128,000
SpeedFast
StatusExperimental
// Experimental — pricing not yet published. Best for planning & structured tasks.
Anthropic Anthropic Models
Sonnet 4.5
Stable · General + content · 200K context
Input /1M$3.00
Output /1M$15.00
Context200,000
SpeedFast
StatusStable
// Slightly older than 4.6. Consider upgrading if both are available.
Sonnet 4.6
Stable · Improved · 200K context
Input /1M$3.00
Output /1M$15.00
Context200,000
SpeedFast
StatusStable ★
// ■ Recommended default Anthropic pick. Same cost as 4.5, improved capability.
Opus 4.5
Experimental · Deep reasoning · 200K context
Input /1M$15.00
Output /1M$75.00
Context200,000
SpeedSlow
StatusExperimental
// Most expensive Anthropic option. Best quality — use for client-facing high-stakes work only.
// Section 04 — Team Usage Tips
01
Default First, Escalate If Needed

Start with GPT-4.1 (your default). Only switch if the task genuinely needs more depth. Most daily tasks don't require GPT-5 Reasoning or Opus.

02
Sonnet 4.6 > Sonnet 4.5

Same price, newer model. If your picker shows both, always choose 4.6. There's no reason to use 4.5 unless 4.6 has an issue.

03
Experimental = Verify Everything

Opus 4.5 and RL FT O4 mini are experimental. Great capability — but outputs may be inconsistent. Always review before sending to a client.

04
GPT-5 Auto Has Hidden Cost Risk

Auto-mode is convenient but if it routes to Reasoning for a simple question, you pay Reasoning prices. Use it when you genuinely don't know the task complexity.

05
Output Tokens Cost 4–5× More

Across all models, output is the expensive part. Tell the model: "Be concise" or "Respond in under 300 words" on simple tasks to cut API spend significantly.

06
Preview Models May Change

GPT-5 Auto and GPT-5 Reasoning are in Preview — pricing, behaviour, and availability can change without notice. Don't build client deliverable pipelines on Preview-only models yet.

// Section 05 — Regulatory & Compliance Status

⚠ IMPORTANT: CMMC is a certification of your business practices, not your AI tool. However, any cloud tool used to process CUI must meet FedRAMP Moderate or higher under DFARS 252.204-7012. The table below shows what each vendor has certified — and which deployment tier is required.

Framework OpenAI / ChatGPT Claude / Anthropic M365 Copilot Grok / xAI
SOC 2 Type II
Enterprise/Team/API only
Free & Plus: NOT covered
API & Claude for Work
Free/Pro: NOT covered
GCC / GCC High / Commercial
Covered under M365 SOC 2
~
Business & Enterprise tier only
Consumer: NOT covered
HIPAA / BAA
Enterprise + API (BAA available)
Must sign BAA explicitly
Claude for Work + API (BAA)
Or via Bedrock/Vertex BAA
All M365 tiers with BAA
GCC High recommended
No BAA available
NOT suitable for PHI
FedRAMP (Moderate+) ~
Via Azure OpenAI (FedRAMP High)
Direct OpenAI API: NOT FedRAMP
Claude for Gov (C4G): FedRAMP High
Also via Bedrock GovCloud IL4/5
~
GCC: FedRAMP High ✓
Commercial: NOT FedRAMP for CUI
No FedRAMP authorization
(DoD pilot separate, not certified)
NIST 800-171 / CMMC L2 ~
Azure OpenAI (FedRAMP) meets req.
Consumer API: do NOT use for CUI
C4G or Bedrock GovCloud meets
FedRAMP Moderate req. for CUI
~
GCC High: CMMC L2/L3 ✓
Commercial M365: NOT compliant
No CMMC compliance path
Do NOT use for CUI/FCI
ISO 27001
27001 + 27017 + 27018 + 27701
API & Enterprise tiers
SOC 2 Type II attested
ISO certs via AWS/GCP infrastructure
27001 + 27017 + 27018
All M365 tiers
No ISO certification
Enterprise tier only has SOC 2
GDPR / CCPA
DPA available · EU data residency
options via Azure EU regions
DPA available · GDPR compliant
Most privacy-forward by default
Full GDPR / CCPA compliance
EU data boundary available
~
GDPR/CCPA claimed (Business+)
DPC (Ireland) investigation ongoing
DoD IL4 / IL5 ~
Via Azure Government OpenAI
Not via direct OpenAI API
Bedrock GovCloud: IL4 + IL5
AWS Secret region: IL6
~
GCC High: IL5 ✓
Copilot in GCC High: limited feature set
No IL authorization
(DoD GenAI.mil pilot ≠ certification)
OpenAI / ChatGPT

Enterprise/API tier: SOC 2 Type II, HIPAA BAA, ISO 27001 — solid for most enterprise use. For CUI/CMMC, must deploy via Azure OpenAI Government, not the direct API. Free and Plus tiers have zero compliance coverage — never use for client data.

Claude / Anthropic

Strongest compliance path of the group. SOC 2 Type II, HIPAA BAA, FedRAMP High via Claude for Government (C4G), DoD IL4/5 via Bedrock GovCloud, IL6 via AWS Secret. For CMMC CUI work — use C4G or Bedrock, not claude.ai directly.

M365 Copilot

Tier matters enormously. Commercial M365 = NOT CMMC compliant. GCC = FedRAMP High, CMMC L2 (non-ITAR). GCC High = FedRAMP High, DFARS 7012, CMMC L2/L3, ITAR. M365 Copilot on GCC High inherits those authorizations — but feature set is reduced vs commercial.

Grok / xAI

Enterprise tier only has SOC 2 + GDPR/CCPA. No FedRAMP, no HIPAA BAA, no CMMC compliance path, no IL authorization. The DoD GenAI.mil pilot is a political deployment — not a certified authorization. Do not use Grok for any regulated client data. Active EU DPC investigation ongoing.

⚠ MSP FIELD RULE — COMPLIANCE TIER HIERARCHY

No compliance coverage: Free/consumer tiers of any AI tool. ChatGPT Free/Plus, Claude Free/Pro, Grok consumer.
SOC 2 only: ChatGPT Enterprise, Claude for Work API, Grok Business/Enterprise — adequate for most commercial MSP work, not for government CUI.
FedRAMP Moderate+ / CMMC-eligible: Claude for Government (C4G), Claude via Bedrock GovCloud, Azure OpenAI Government, M365 GCC or GCC High.
DoD IL5 / ITAR: M365 GCC High, Claude via Bedrock GovCloud IL5, Azure OpenAI GovCloud only.
Remember: CMMC certifies your organisation's controls, not the tool. A FedRAMP-authorized tool is required infrastructure — but your SSP, policies, and audit evidence are what get certified.