CyberAdvisors · Reference

AI Model Guide

Choose the right model for the right job — capabilities, cost and context window at a glance.

// Section 01 — Full Decision Matrix

Which Engine For What ■ = best fit ◆ = capable ▲ = limited · = not recommended

Task / Metric	GPT-4.1 OpenAI	GPT-5 Chat OpenAI	GPT-5 Auto OpenAI	GPT-5 Reasoning OpenAI	O4 mini RL FT OpenAI	Sonnet 4.5 Anthropic	Sonnet 4.6 Anthropic	Opus 4.5 Anthropic
Status	Default	Stable	Preview	Preview	Exp.	Stable	Stable	Exp.
Quick Q&A	■	■	◆	·	▲	■	■	◆
Coding	■	■	■	■	■	■	■	◆
Long Writing	◆	■	◆	▲	·	■	■	■
⬡ Deep Reasoning	▲	◆	■	■	■	◆	◆	■
∑ Data / Math	◆	◆	■	■	■	◆	◆	■
Summarise Doc	■	■	◆	▲	▲	■	■	■
Speed

Input $/1M	$2.00	$10.00	Varies	$15.00	TBD	$3.00	$3.00	$15.00
Output $/1M	$8.00	$30.00	Varies	$60.00	TBD	$15.00	$15.00	$75.00
Context Window	1M	1M	1M	1M	128K	200K	200K	200K

// Section 02 — Task-to-Model Quick Pick

Quick Q&A / Daily Chat

Use This

GPT-4.1 (Default)

Fast, cheap, and accurate for everyday questions. Already set as your default — don't overthink it.

Alt: Claude Sonnet 4.6 if you prefer Anthropic's tone

Code Generation

Use This

Claude Sonnet 4.6

Excellent instruction-following, reliable output structure, and solid at PowerShell, Python, and JavaScript. Less likely to pad with unnecessary code.

Alt: GPT-4.1 · Heavy logic: GPT-5 Reasoning

Long-Form Writing / Docs

Use This

Claude Sonnet 4.5 or 4.6

Best tone awareness, nuance, and structure for technical documentation, client reports, SOPs, and proposals. 200K context handles long drafts.

Premium: Claude Opus 4.5 for high-stakes client deliverables

⬡Deep Reasoning / Analysis

Use This

GPT-5 Reasoning (Preview)

Internal "thinking" before responding. Best for multi-step logic, root cause analysis, complex troubleshooting trees, or policy interpretation.

Alt: Claude Opus 4.5 · Budget: RL FT O4 mini

∑Data / Math / STEM

Use This

RL FineTuned O4 mini

Reinforcement-learning tuned planner — strongest at structured planning, numerical reasoning, and constraint solving. Experimental but fast.

Alt: GPT-5 Reasoning · Claude Opus 4.5

Summarise a Long Document

Use This

GPT-5 Chat or Claude Sonnet 4.6

Both have 1M/200K context windows and handle long inputs without losing thread. GPT-5 Chat for OpenAI ecosystem; Sonnet for Anthropic's cleaner summaries.

Huge files (500K+ tokens): GPT-5 Chat (1M ctx wins)

Unknown / Mixed Task

Use This

GPT-5 Auto (Preview)

Automatically selects between chat and reasoning mode based on query complexity. Best when you don't know which mode your task needs.

Note: cost varies with selected mode

Highest Quality / No Budget Limit

Use This

Claude Opus 4.5

Experimental but top-tier. Best for final client deliverables, executive reports, complex problem solving. Use when quality matters more than cost.

Alt: GPT-5 Reasoning for math-heavy tasks

// Section 03 — Cost Metrics Per Model

Prices per 1 million tokens via API. Subscription/UI usage is absorbed in your plan. Output tokens cost more — keep responses concise to save budget.

OpenAI OpenAI Models

GPT-4.1

Default · Best all-rounder · 1M context

Input /1M$2.00

Output /1M$8.00

Context1,000,000

SpeedFast

StatusDefault ✓

// Best cost/performance ratio in the picker. Start here for anything routine.

GPT-5 Chat

Stable · General tasks · 1M context

Input /1M~$10.00

Output /1M~$30.00

Context1,000,000

SpeedModerate

StatusStable

// ~5× more expensive than GPT-4.1. Use when quality uplift justifies it.

GPT-5 Auto

Preview · Mode-switching · 1M context

Input /1MVaries*

Output /1MVaries*

Context1,000,000

SpeedVaries

StatusPreview

// *Routes to chat or reasoning mode. Cost unpredictable — avoid for budget-sensitive runs.

GPT-5 Reasoning

Preview · Max depth · 1M context

Input /1M~$15.00

Output /1M~$60.00

Context1,000,000

SpeedSlow

StatusPreview

// Thinking tokens are billed. Reserve for genuinely hard problems only.

RL FT O4 mini

Experimental · Planner · 128K context

Input /1MTBD

Output /1MTBD

Context128,000

SpeedFast

StatusExperimental

// Experimental — pricing not yet published. Best for planning & structured tasks.

Anthropic Anthropic Models

Sonnet 4.5

Stable · General + content · 200K context

Input /1M$3.00

Output /1M$15.00

Context200,000

SpeedFast

StatusStable

// Slightly older than 4.6. Consider upgrading if both are available.

Sonnet 4.6

Stable · Improved · 200K context

Input /1M$3.00

Output /1M$15.00

Context200,000

SpeedFast

StatusStable ★

// ■ Recommended default Anthropic pick. Same cost as 4.5, improved capability.

Opus 4.5

Experimental · Deep reasoning · 200K context

Input /1M$15.00

Output /1M$75.00

Context200,000

SpeedSlow

StatusExperimental

// Most expensive Anthropic option. Best quality — use for client-facing high-stakes work only.

// Section 04 — Team Usage Tips

Default First, Escalate If Needed

Start with GPT-4.1 (your default). Only switch if the task genuinely needs more depth. Most daily tasks don't require GPT-5 Reasoning or Opus.

Sonnet 4.6 > Sonnet 4.5

Same price, newer model. If your picker shows both, always choose 4.6. There's no reason to use 4.5 unless 4.6 has an issue.

Experimental = Verify Everything

Opus 4.5 and RL FT O4 mini are experimental. Great capability — but outputs may be inconsistent. Always review before sending to a client.

GPT-5 Auto Has Hidden Cost Risk

Auto-mode is convenient but if it routes to Reasoning for a simple question, you pay Reasoning prices. Use it when you genuinely don't know the task complexity.

Output Tokens Cost 4–5× More

Across all models, output is the expensive part. Tell the model: "Be concise" or "Respond in under 300 words" on simple tasks to cut API spend significantly.

Preview Models May Change

GPT-5 Auto and GPT-5 Reasoning are in Preview — pricing, behaviour, and availability can change without notice. Don't build client deliverable pipelines on Preview-only models yet.

// Section 05 — Regulatory & Compliance Status

⚠ IMPORTANT: CMMC is a certification of your business practices, not your AI tool. However, any cloud tool used to process CUI must meet FedRAMP Moderate or higher under DFARS 252.204-7012. The table below shows what each vendor has certified — and which deployment tier is required.

Framework	OpenAI / ChatGPT	Claude / Anthropic	M365 Copilot	Grok / xAI
SOC 2 Type II	✓ Enterprise/Team/API only Free & Plus: NOT covered	✓ API & Claude for Work Free/Pro: NOT covered	✓ GCC / GCC High / Commercial Covered under M365 SOC 2	~ Business & Enterprise tier only Consumer: NOT covered
HIPAA / BAA	✓ Enterprise + API (BAA available) Must sign BAA explicitly	✓ Claude for Work + API (BAA) Or via Bedrock/Vertex BAA	✓ All M365 tiers with BAA GCC High recommended	— No BAA available NOT suitable for PHI
FedRAMP (Moderate+)	~ Via Azure OpenAI (FedRAMP High) Direct OpenAI API: NOT FedRAMP	✓ Claude for Gov (C4G): FedRAMP High Also via Bedrock GovCloud IL4/5	~ GCC: FedRAMP High ✓ Commercial: NOT FedRAMP for CUI	— No FedRAMP authorization (DoD pilot separate, not certified)
NIST 800-171 / CMMC L2	~ Azure OpenAI (FedRAMP) meets req. Consumer API: do NOT use for CUI	✓ C4G or Bedrock GovCloud meets FedRAMP Moderate req. for CUI	~ GCC High: CMMC L2/L3 ✓ Commercial M365: NOT compliant	— No CMMC compliance path Do NOT use for CUI/FCI
ISO 27001	✓ 27001 + 27017 + 27018 + 27701 API & Enterprise tiers	✓ SOC 2 Type II attested ISO certs via AWS/GCP infrastructure	✓ 27001 + 27017 + 27018 All M365 tiers	— No ISO certification Enterprise tier only has SOC 2
GDPR / CCPA	✓ DPA available · EU data residency options via Azure EU regions	✓ DPA available · GDPR compliant Most privacy-forward by default	✓ Full GDPR / CCPA compliance EU data boundary available	~ GDPR/CCPA claimed (Business+) DPC (Ireland) investigation ongoing
DoD IL4 / IL5	~ Via Azure Government OpenAI Not via direct OpenAI API	✓ Bedrock GovCloud: IL4 + IL5 AWS Secret region: IL6	~ GCC High: IL5 ✓ Copilot in GCC High: limited feature set	— No IL authorization (DoD GenAI.mil pilot ≠ certification)

OpenAI / ChatGPT

Enterprise/API tier: SOC 2 Type II, HIPAA BAA, ISO 27001 — solid for most enterprise use. For CUI/CMMC, must deploy via Azure OpenAI Government, not the direct API. Free and Plus tiers have zero compliance coverage — never use for client data.

Claude / Anthropic

Strongest compliance path of the group. SOC 2 Type II, HIPAA BAA, FedRAMP High via Claude for Government (C4G), DoD IL4/5 via Bedrock GovCloud, IL6 via AWS Secret. For CMMC CUI work — use C4G or Bedrock, not claude.ai directly.

M365 Copilot

Tier matters enormously. Commercial M365 = NOT CMMC compliant. GCC = FedRAMP High, CMMC L2 (non-ITAR). GCC High = FedRAMP High, DFARS 7012, CMMC L2/L3, ITAR. M365 Copilot on GCC High inherits those authorizations — but feature set is reduced vs commercial.

Grok / xAI

Enterprise tier only has SOC 2 + GDPR/CCPA. No FedRAMP, no HIPAA BAA, no CMMC compliance path, no IL authorization. The DoD GenAI.mil pilot is a political deployment — not a certified authorization. Do not use Grok for any regulated client data. Active EU DPC investigation ongoing.

⚠ MSP FIELD RULE — COMPLIANCE TIER HIERARCHY

No compliance coverage: Free/consumer tiers of any AI tool. ChatGPT Free/Plus, Claude Free/Pro, Grok consumer.
SOC 2 only: ChatGPT Enterprise, Claude for Work API, Grok Business/Enterprise — adequate for most commercial MSP work, not for government CUI.
FedRAMP Moderate+ / CMMC-eligible: Claude for Government (C4G), Claude via Bedrock GovCloud, Azure OpenAI Government, M365 GCC or GCC High.
DoD IL5 / ITAR: M365 GCC High, Claude via Bedrock GovCloud IL5, Azure OpenAI GovCloud only.
Remember: CMMC certifies your organisation's controls, not the tool. A FedRAMP-authorized tool is required infrastructure — but your SSP, policies, and audit evidence are what get certified.