Living Resource

Ultimate Guide to Open Source AI Models

A practical, no-nonsense guide for founders, engineers, and AI teams deciding which open source or open-weight models are actually worth testing — by workload, benchmark profile, license fit, and hardware reality.

Last reviewed: May 14, 2026 Best for: model selection, evaluation, deployment planning Maintained as: evergreen reference

Start with the decision framework Jump to sources

Illustration of open AI model categories including language, coding, agents, multimodal, image, and video.

Executive summary

If you only need the short version, it is this: most teams evaluating open models in 2026 should begin with a strong 7B-32B text model, an explicit evaluation harness, and a clear hardware budget before they touch giant MoE systems, open video stacks, or "frontier" model marketing.

MoE architectures dominate the frontier

Qwen3.6, Llama 4, DeepSeek-V3.2, Mistral Large 3, Gemma 4, and gpt-oss all use mixture-of-experts, delivering frontier quality with a fraction of the active parameters. Plan for MoE-friendly serving from day one.

Open models match proprietary frontier performance

DeepSeek-V3.2 sits at GPT-5-class reasoning. Qwen3.6 and Mistral Large 3 compete with closed multimodal models. OpenAI's gpt-oss runs on consumer hardware. For many real workloads, the open/closed gap has effectively closed.

Treat every checkpoint as its own legal object

License clarity matters at the checkpoint level, not the family level. Llama 4 is community-licensed with EU multimodal restrictions; Gemma 4 is Apache 2.0 but Gemma 3 used custom terms; FLUX.1-schnell is permissive but other FLUX variants are not.

Omnimodal is the new frontier

Native multimodal is now baseline. The 2026 step-change is omnimodal: Qwen3.5-Omni and MiniCPM-o 4.5 handle text, image, audio, and streaming speech generation in one model. Grounded video understanding (Molmo 2) is also emerging as a distinct capability.

Bottom line

The right model is not the one with the biggest benchmark headline. It is the one that clears your real task evaluations, fits your hardware envelope, survives structured-output tests, and carries a license your company can live with.

Decision framework: if you want X, start with Y

This matrix is designed to help teams choose a sensible starting point instead of trying everything at once.

Use case	Start here	Why	Hardware reality	Watch-out
General writing, chat, summaries, RAG	Qwen3.6-35B-A3B or Llama 4 Scout	Qwen3.6 is the current open-weight Qwen default: 35B total / 3B active, 262K native context, Apache 2.0. Llama 4 Scout offers strong multimodal performance on a single H100.	Qwen3.6-35B-A3B fits comfortably on 24–48GB VRAM with MoE serving. Llama 4 Scout fits a single H100 with INT4.	Confirm license fit before standardizing — Llama 4 has an EU multimodal restriction.
Coding assistant for real development work	Qwen3-Coder-Next	Purpose-built for coding agents: 80B total / 3B active, 256K context, Apache 2.0, optimized for long-horizon tool use.	Comfortable on 24GB VRAM thanks to MoE; Qwen recommends vLLM or SGLang for serving.	You still need tests, linting, sandboxing, and security review.
Tool-using agents and workflow automation	Qwen3.6-35B-A3B or Llama 4 Maverick	Native tool-calling support, strong structured output, broad ecosystem adoption.	35B-A3B is the practical starting band; Maverick is datacenter-only at 400B total / 17B active.	Agent quality depends as much on orchestration and evals as on the base model. DeepSeek V3.2-Speciale drops tool calling — use standard V3.2 if you need agentic behavior.
Top-end open reasoning and large-scale inference	DeepSeek-V3.2	GPT-5-class reasoning with integrated thinking and tool-use under MIT. Speciale variant exceeds GPT-5 on math/reasoning but drops tool calling.	Datacenter-class serving (685B parameters, DeepSeek Sparse Attention for long-context efficiency).	Too large for most local teams; API or cloud serving is the realistic path.
Omnimodal — speech + vision + text in one model	Qwen3.5-Omni	Native omnimodal trained on 100M+ hours of audio-visual data. Strong audio-visual reasoning and streaming speech generation.	Datacenter-class for full quality; smaller real-time alternative is MiniCPM-o 4.5 (9B).	Production omni stacks need careful real-time orchestration; streaming speech latency is the main constraint.
Vision-language understanding	Gemma 4 (E4B or 26B MoE) or Qwen3.6-35B-A3B	Gemma 4 is Google's current open multimodal family with up to 256K context under Apache 2.0. Qwen3.6 handles vision natively.	Gemma 4 E2B/E4B run on consumer GPUs; 26B MoE on prosumer hardware.	For true video grounding, pointing, and counting, prefer Molmo 2 — it is purpose-built for grounded visual reasoning.
Edge or low-footprint multimodal	MiniCPM-V 4.6 (1.3B) or Gemma 4 E2B	Edge-friendly visual models for phones, IoT, and lightweight servers. Apache 2.0.	Runs on consumer hardware with 4–8GB VRAM, sometimes CPU-only.	Capability ceiling is real — pick for footprint, not absolute accuracy.
Image generation	SDXL or FLUX.1-schnell	SDXL remains the mature ecosystem baseline. FLUX.1-schnell is the fast, permissive 12B alternative under Apache 2.0.	SDXL is happiest around 12GB VRAM; FLUX.1-schnell needs more for full quality.	FLUX licensing is checkpoint-specific — schnell is Apache 2.0, but other FLUX variants have stricter terms.
Image editing, control, inpainting	Diffusers + ControlNet + SAM 2 + LaMa	Editing is a stack problem, not a single-model problem.	Consumer GPUs handle most workflows.	Workflow quality depends on masks, conditioning, and operator skill.
Speech recognition	Whisper large-v3 or Whisper turbo	Large-v3 remains the accuracy benchmark; turbo is the faster official derivative for streaming and low-latency use.	Consumer GPUs handle both; large-v3 needs ~10GB VRAM.	Accuracy varies significantly across the long tail of languages — verify for your domain.
Speech synthesis (TTS)	Chatterbox	Resemble AI's open TTS family with Turbo (350M) for low latency and Multilingual (500M) for broad language coverage. Modern voice cloning with expressive controls.	Runs comfortably on consumer GPUs; viable for real-time agent voices.	Voice cloning has ethical and legal implications — confirm consent and disclosure rules.
Open video experiments	Wan2.1 family or Open-Sora 2.0	Wan2.1 is the most actively expanded open video family with FLF2V and VACE extensions. Open-Sora 2.0 is the openly trained 11B alternative under Apache 2.0.	16–24GB VRAM for comfortable workflows; smaller variants go lower.	Video quality, latency, and consistency remain uneven. Treat as R&D, not production-default. Verify exact LICENSE before commercial use.

Rule of thumb

Choose the smallest model that reliably completes your real task with the right output shape. Then move up only if the gains are measurable.

What counts as "open" here

This page separates fully open releases from open-weight releases and license-restricted releases because the market still collapses those categories into the same marketing label.

Best for research and auditability

Fully open

Weights, code, and meaningful training information are available. These releases are the closest match to the OSI-style vision of Open Source AI. Ai2's OLMo 3 leads this category with full model-flow traceability.

OLMo 3Molmo 2

Best for practical deployment

Open weight

You can download and run the weights, but the full training data and recipe are not completely reproducible. This is where most high-performing "open" models sit today.

Qwen3.6Llama 4DeepSeek-V3.2Gemma 4Mistral Large 3gpt-ossMiniCPM-o

Read the license carefully

Source-available or restricted

Some code and weights are available, but revenue thresholds, non-commercial clauses, behavioral restrictions, or geographic limits apply. Llama 4 carries an EU multimodal restriction; AudioCraft weights are CC-BY-NC; some FLUX variants are non-commercial.

Llama 4 (EU multimodal)Non-schnell FLUX variantsAudioCraft weights

A capability map showing language, coding, agent, multimodal, image, and video model categories. — Open model selection works best when you think in capability families, not only in leaderboard rows.

The practical model landscape

Open AI is no longer one category. The ecosystem now includes general-purpose LLMs, code-specialized models, multimodal models, image generators, image-editing stacks, and increasingly capable video systems.

Category	Best for	Start here	Move up to	Sweet spot	Watch-outs
Writing / general LLM	Chat, drafting, summarization, RAG, internal copilots	Qwen3.6-35B-A3B or Llama 4 Scout	Qwen 3.5 397B-A17B, Llama 4 Maverick, Mistral Large 3, DeepSeek-V3.2	Qwen3.6-35B-A3B or gpt-oss-20b covers most team needs without datacenter overhead.	License terms (Llama 4 EU multimodal limits) and quantization quality matter more than leaderboard hype.
Coding	PR assistance, code generation, refactors, test writing, coding agents	Qwen3-Coder-Next	Qwen3-Coder-480B-A35B, Mistral Large 3, or DeepSeek-V3.2	Qwen3-Coder-Next (80B total / 3B active, 256K context) is the most practical for real developer use.	Do not deploy without tests, sandboxing, and dependency/security review.
Agents	Tool use, workflow automation, multi-step task execution	Qwen3.6-35B-A3B or Llama 4 Scout	Llama 4 Maverick, Mistral Large 3, or DeepSeek-V3.2	Smaller models plus strong orchestration often beat oversized models with weak tooling.	JSON breakage, tool misuse, and cascading failures are the real bottlenecks. Avoid DeepSeek V3.2-Speciale here — it drops tool calling.
Multimodal / VLM	Document understanding, image Q&A, visual agents, OCR-heavy workflows	Gemma 4 (E4B or 26B MoE) or Qwen3.6-35B-A3B	Qwen 3.5 397B-A17B or Llama 4 Maverick	Gemma 4 26B MoE is the most practical local starting point with up to 256K context.	Grounding mistakes and OCR hallucinations still require checks; for true visual grounding, prefer Molmo 2.
Omnimodal (text + vision + speech)	Voice-first assistants, real-time audio-visual reasoning, streaming speech generation	MiniCPM-o 4.5 (9B) for local real-time	Qwen3.5-Omni for full audio-visual reasoning at datacenter scale	MiniCPM-o 4.5 gives most teams a real-time omni model that fits on prosumer hardware.	Some serving stacks still need patched support; streaming speech latency is the binding constraint.
Video grounding / pointing	Video understanding, object pointing, tracking, multi-image reasoning	Molmo 2 (4B, 8B, O-7B variants)	Custom Molmo 2 fine-tunes on domain data	The only open family in 2026 shipping true open-data video grounding without distillation from proprietary VLMs.	Not a general chat model — pair with a chat-capable LLM for conversational interfaces.
Edge multimodal	Phones, IoT, lightweight servers, on-device VLM workloads	MiniCPM-V 4.6 (1.3B) or Gemma 4 E2B	MiniCPM-o 4.5 or Gemma 4 E4B when more capability is required	MiniCPM-V 4.6 punches above its weight for visual tasks on 4–8GB devices.	Capability ceiling is real — do not expect 27B-class reasoning at 1.3B.
Image generation	Concept art, marketing assets, ideation, product visuals	SDXL	FLUX.1-schnell (Apache 2.0) or other FLUX variants when licensing permits	SDXL remains the safest default for ecosystem compatibility; FLUX.1-schnell when you need speed and a permissive license.	Typography and exact prompt fidelity still need workflow iteration. FLUX licensing is checkpoint-specific.
Image editing	Inpainting, control, masking, pose/depth guidance, product edits	ControlNet + SAM 2 + LaMa + Diffusers	Project-specific editing stacks with custom masks and pipelines	Editing quality comes from stack design, not one magic checkpoint.	Commercial rights differ across base checkpoints and extensions.
Speech recognition	Transcription, translation, voice interfaces, audio understanding	Whisper large-v3	Whisper turbo for faster streaming, or Qwen3.5-Omni for unified audio + text reasoning	Whisper large-v3 covers most transcription and translation needs; turbo is the official low-latency derivative.	AudioCraft code is MIT but its model weights are CC-BY-NC 4.0 — not usable for most commercial audio generation.
Speech synthesis (TTS)	Voice agents, narration, dubbing, expressive synthesis	Chatterbox-Turbo (350M) or Chatterbox-Multilingual (500M)	Domain-tuned Chatterbox variants	The cleanest open TTS row in 2026, with modern voice cloning and expressive controls.	Voice cloning has ethical and legal implications. Confirm consent rules before deploying.
Video generation	Short exploratory clips, motion concepts, early creative prototyping	Wan2.1 small variants	Wan2.1 family extensions (FLF2V, VACE) or Open-Sora 2.0 (11B, Apache 2.0)	Today, open video is a prototyping tool more than a production default.	Temporal flicker, identity drift, and long render times remain common. Verify the repo LICENSE before commercial use.
Video editing	Interpolation, inpainting, retiming, experimental edit pipelines	RIFE, ProPainter, Wan2.1 VACE-style workflows	Custom pipelines for domain-specific video tasks	Use specialized tools rather than expecting one general model to handle everything.	Workflow complexity is high; results are sensitive to clip quality and masking. Some tools (e.g., ProPainter) ship under research-only S-Lab terms — verify license fit.

The biggest change from 2024 to 2026 is not just raw model quality. It is the breadth of credible open options across text, coding, multimodal, and media generation.

Components and workflow tools

Some of the most useful tools in open AI are not foundation models. They are conditioning architectures, segmentation models, and workflow utilities that sit on top of generators. They do not belong in a head-to-head leaderboard with Qwen, Gemma, or DeepSeek, but they are essential to most real production pipelines.

Image conditioning

ControlNet

Control architecture for Stable Diffusion-class workflows. Adds pose, depth, edge, and segmentation conditioning to diffusion image generation. A workflow component, not a standalone foundation model.

License: Apache 2.0 code; OpenRAIL weight distribution

Vision grounding

SAM 2

Meta's promptable segmentation foundation model for images and video. Used as a building block for masking, editing pipelines, and visual agents.

License: Apache 2.0 with BSD-3-Clause components

Video workflow

FramePack

Local video workflow tool for next-frame prediction inference. Helps produce long, consistent video clips on constrained consumer hardware.

License: Verify repo LICENSE before commercial use

Image inpainting

LaMa

Resolution-robust inpainting model. Best for clean object removal and background reconstruction inside larger editing stacks.

License: Apache 2.0

Video interpolation

RIFE

Frame interpolation model for smooth slow motion and frame-rate upscaling. Pairs well with video generation pipelines.

License: MIT (code) — verify weight distribution

Video inpainting

ProPainter

Mask-aware video inpainting and object removal. Specialized for video editing; not a general-purpose video tool.

License: S-Lab License (research / non-commercial — verify before production)

When to reach for these

Use these components alongside foundation models, not instead of them. ControlNet adds conditioning to image generation; SAM 2 adds grounding to vision pipelines; FramePack, RIFE, and ProPainter extend video workflows.

Benchmark snapshot: what the top open families report

These numbers are useful as a map, not as a verdict. Benchmark settings vary. Prompt formatting moves scores. Preference benchmarks can overstate real operational reliability. Use this as the first filter, then test on your own workload.

Model	General	Reasoning	Coding	Notes
Qwen3.6-35B-A3B	Current Qwen open-weight default	Improved multimodal and agentic behavior vs. Qwen 3.5	Stronger real-world coding than the Qwen 3.5 baseline	35B total / 3B active MoE. 262,144 native context. Apache 2.0. Released April 14, 2026.
Qwen 3.5 397B-A17B	MMLU-Pro 87.8, SuperGPQA 70.4	AIME26 91.3, GPQA Diamond 88.4	LiveCodeBench v6 83.6	Frontier MoE with only 17B active params. Native multimodal, 201 languages. Numbers from Qwen 3.5 announcement materials.
DeepSeek-V3.2	Comparable to GPT-5	IMO and IOI gold-medal level	SWE-bench competitive with GPT-5	685B parameters under MIT License. DeepSeek Sparse Attention for long-context efficiency. Speciale variant exceeds GPT-5 on reasoning but drops tool calling.
Mistral Large 3	MMLU-Pro ~73–78	Strong mid-to-high tier	HumanEval ~92	675B total / 41B active MoE. Apache 2.0. Multimodal. 256K context. Deployable on a single 8×A100 or 8×H100 node.
Llama 4 Maverick	MMLU 85.5, MMLU-Pro 80.5	GPQA Diamond 69.8	HumanEval 82.4	400B total / 17B active, 128 experts. Natively multimodal, 1M context. Llama 4 Community License with EU multimodal restriction.
Llama 4 Scout	MMLU 79.6, MMLU-Pro 74.3	GPQA Diamond 57.2	HumanEval 74.1	109B total / 17B active, 16 experts. 10M context. Fits on a single H100 with INT4. Llama 4 Community License — not OSI-style open source.
gpt-oss-120b	MMLU-Pro 90.0	AIME 2025 97.9 (with tools)	Near o4-mini on competition coding	117B total / 5.1B active MoE. Apache 2.0 with usage policy. Fits on a single 80GB GPU. 128K context.
gpt-oss-20b	Matches o3-mini on common benchmarks	Strong for its size class	Competitive with o3-mini	21B total / 3.6B active MoE. Apache 2.0 with usage policy. Runs on 16GB devices. 128K context.
Gemma 4	Current Google flagship open family	Improved over Gemma 3 across reasoning benchmarks	Strong for size class	Apache 2.0. Released April 2, 2026 in E2B, E4B, 26B MoE, and 31B sizes. Up to 256K context. Multimodal input.
OLMo 3 32B-Think	Leading fully-open reasoning model	Strongest open-traceable reasoning in late-2025/early-2026	Competitive among fully-open releases	Released November 2025. Full model-flow traceability and open training recipe. Replaces OLMo 2 as the "truly open" flagship.

How to use benchmarks correctly

Use one academic snapshot table, one real-work evaluation table, and one reliability table. If a model only looks good in one of those three, it is not production-ready for your team.

Open vs. closed models: where each wins

The real tradeoff is not "open is better" or "closed is better." It is whether you want control, customization, and privacy enough to take on the systems burden yourself.

Dimension	Open / open-weight	Closed ecosystem
Control	Self-host, fine-tune, inspect, and route however you want.	Fastest path to strong capability with less systems work.
Cost model	Infrastructure, ops, and engineering replace per-token API pricing.	Usage-based pricing is simple but can become expensive at scale.
Privacy and data boundary	Best option when prompts, outputs, and logs must stay inside your environment.	Provider policy and retention controls matter more.
Customization	Adapters, quantization, routing, and domain tuning are the major advantages.	Prompting is easy; deep model customization is limited.
Operational burden	You own serving, evals, security, and reliability.	You inherit better managed infrastructure and usually better SLAs.
Best fit	Teams with repeatable workloads, privacy needs, or platform ambitions.	Teams optimizing for speed, simplicity, and managed frontier access.

Diagram showing consumer, prosumer, and enterprise hardware tiers for open model workloads. — Hardware fit is one of the fastest ways to narrow the field before you benchmark anything.

Hardware tiers: what you actually need

The fastest way to waste time in open AI is to choose models before you define the serving envelope. Pick the hardware tier first, then shortlist models that fit.

Consumer / hobbyist

Single GPU, 12–16GB VRAM, 32–64GB RAM

What fits: Gemma 4 E2B/E4B, MiniCPM-V 4.6, gpt-oss-20b, smaller Qwen3.6 variants, lightweight coding models, SDXL

Best for: Local testing, lightweight RAG, first agents, edge multimodal, image generation

Watch-outs: Do not expect comfortable 35B+ MoE serving or serious open video production.

Prosumer / advanced local

24–48GB VRAM, 64–128GB RAM, fast NVMe, optional multi-GPU

What fits: Qwen3.6-35B-A3B, Qwen3-Coder-Next, Gemma 4 26B MoE / 31B, gpt-oss-120b, MiniCPM-o 4.5 (9B), small video stacks

Best for: Serious private assistants, agentic coding, local omni experiments, MoE serving

Watch-outs: Open video is still slow and multi-step agent stacks need careful tuning.

Enterprise / datacenter

Multi-GPU clusters, high-bandwidth networking, optimized serving

What fits: Qwen 3.5 397B-A17B, Llama 4 Maverick (single H100 DGX), Mistral Large 3 (8×A100/H100), DeepSeek-V3.2, Qwen3.5-Omni

Best for: Internal copilots, agent platforms, omnimodal services, governed deployment

Watch-outs: Reliability, governance, and evaluation matter more than raw model choice at this tier.

Practical serving reality

For most real teams, the 14B-32B band is the easiest place to get strong quality without crossing into difficult multi-GPU operations. Giant MoE systems make sense later, not first.

Hallucinations, reliability, and the failure modes that matter

Hallucinations are only one part of the reliability story. Open models also fail through prompt sensitivity, poor tool arguments, visual grounding errors, license misunderstandings, and brittle long-context behavior.

Text and coding models

The most common failures are fabricated facts, false confidence, stale knowledge, malformed JSON, and plausible-but-wrong code. Code models can also generate insecure or license-sensitive output.

Multimodal models

Expect OCR misses, object misidentification, incorrect grounding, and overconfident descriptions of partially visible content.

Image models

The main problems are prompt drift, poor typography, inconsistent identity, and weak fine-grained control unless you add editing and conditioning tools.

Video models

The biggest issues remain temporal flicker, identity drift, motion incoherence, and long runtimes for short clips.

Reliability checklist

Treat hallucinations as a systems problem, not only a model problem.
Require citations or retrieval for factual workflows.
Schema-validate every tool call and structured output.
Use test suites and eval harnesses before swapping models.
Separate "good at chat" from "good at operations."
Expect prompt sensitivity, especially around formatting and long contexts.
Add human review for regulated, financial, legal, medical, or externally visible outputs.

Licensing: the most overlooked part of model selection

License fit is not cleanup work after the benchmark review. It is one of the first filters. Many teams waste time evaluating models they cannot legally or economically ship.

License pattern	Best for	Examples	Watch-out
Apache 2.0 / MIT	Commercial deployment and broad integration	OLMo 3, Whisper code, Qwen3.6, Qwen3-Coder-Next, Mistral Large 3, gpt-oss (with usage policy), Gemma 4, FLUX.1-schnell, MiniCPM-V 4.6, MiniCPM-o 4.5, Molmo 2, ControlNet code, SAM 2	Verify each model card — checkpoint-level terms can differ even when the family is described as permissive.
Llama 4 Community License	Commercial use with strong ecosystem momentum	Llama 4 Scout, Llama 4 Maverick	Permissive for many uses, but not OSI-style open source. The policy includes an EU restriction for multimodal use and broader acceptable-use limits.
Gemma terms / custom terms (legacy)	Practical use of older Gemma generations	Gemma 3 and earlier	Gemma 4 moved to Apache 2.0 in April 2026, but Gemma 3 and earlier remain under Google's custom terms. The family name alone is not a license signal.
OpenRAIL / Responsible AI licenses	Creative or research use where behavioral restrictions are acceptable	SDXL (CreativeML Open RAIL++-M), ControlNet weight distributions, BigCode OpenRAIL-M	Behavioral restrictions and downstream obligations can affect productization.
Community / revenue-threshold licenses	Early testing before full commercialization	Some Stability releases	Revenue thresholds and enterprise terms can change the total cost of ownership.
Non-commercial weight licenses	Research, experimentation, internal evaluation	Non-schnell FLUX variants, AudioCraft weights (CC-BY-NC 4.0), some video editing tools (e.g., ProPainter under S-Lab research terms)	This is a hard stop for many production uses. Verify the LICENSE file in every video/tooling project before publishing legal language.

A good licensing rule

Treat every checkpoint as its own legal object. Do not assume the family name tells you the full commercial story.

Recommended deployment stacks

Choosing a model without choosing a serving and evaluation stack is incomplete. The stack determines latency, batching, observability, and how painful future model swaps will be.

Ollama + llama.cpp

Best for: Fastest path to local testing

Strengths: Great for laptops, desktops, and quick internal prototypes.

Limits: Not the best fit for serious multi-user production serving.

vLLM and SGLang

Best for: High-throughput production inference for MoE models

Strengths: Paged attention, strong batching, and a mature serving ecosystem. SGLang is explicitly recommended by Qwen for Qwen3-Coder-Next.

Limits: More ops-heavy than local tools.

TensorRT-LLM

Best for: NVIDIA-centric optimized serving

Strengths: Best when you want GPU-specific performance tuning at scale.

Limits: More specialized setup and infra assumptions.

Transformers + Diffusers

Best for: Custom workflows and research flexibility

Strengths: Best ecosystem for model experimentation, adapters, and editing pipelines.

Limits: Requires more assembly than end-user desktop tools.

ComfyUI

Best for: Creative image and video workflows

Strengths: Visual pipeline building, strong community extensions, easy iteration.

Limits: Operational governance is weaker than code-first stacks.

LangGraph / LlamaIndex / AutoGen

Best for: Agents, tool use, and workflow orchestration

Strengths: Useful abstractions for state, retrieval, and multi-step execution.

Limits: They do not fix weak evals or poor model choices for you.

Recommended starting stacks by team profile

Use these as default launch points, not as permanent architecture decisions.

Founder or operator testing AI internally

Start with Qwen3.6-35B-A3B or Llama 4 Scout, run it through a small RAG layer, and measure task completion before chasing bigger models.

Developer building a local coding copilot

Start with Qwen3-Coder-Next (80B total / 3B active, Apache 2.0), then step up only if your evals show clear gains on your real repos.

Creative team evaluating image and video

Use SDXL or FLUX.1-schnell for image work first. Treat open video (Wan2.1, Open-Sora 2.0) as an R&D lane, not your default production pipeline.

Team building voice or omnimodal interfaces

Start with MiniCPM-o 4.5 for local real-time omni, Whisper large-v3 or turbo for ASR, and Chatterbox for TTS. Step up to Qwen3.5-Omni when you need datacenter-class audio-visual reasoning.

Enterprise team with privacy and governance requirements

Prioritize license clarity (per-checkpoint, not per-family), eval discipline, and serving fit over raw leaderboard rank. vLLM-class serving plus Qwen3.6-35B-A3B or Llama 4 Scout (mind the EU multimodal restriction) is usually the right first step.

Best first experiment

Pick one workflow, one evaluation harness, one hardware target, and three candidate models. Anything broader becomes expensive research theater.

Sources and methodology

This page is built from model cards, technical reports, official repositories, standards bodies, and tooling documentation. The goal is practical decision support, not hype-driven ranking.

Open Source Initiative — Open Source AI Definition https://opensource.org/ai/open-source-ai-definition Qwen on Hugging Face (Qwen3.6, Qwen3.5-Omni, Qwen3-Coder) https://huggingface.co/Qwen Qwen 3.5 announcement https://qwen.ai/blog?id=qwen3.5 Qwen 3.5 GitHub repository https://github.com/QwenLM/Qwen3.5 Qwen3-Coder GitHub repository https://github.com/QwenLM/Qwen3-Coder Meta Llama 4 announcement https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Llama 4 model page https://www.llama.com/models/llama-4/ Llama 4 Community License (Meta) https://www.llama.com/llama4/license/ DeepSeek-V3.2 Technical Report https://arxiv.org/html/2512.02556v1 DeepSeek-V3.2 Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-V3.2 DeepSeek-V3.2-Speciale Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale Gemma developer docs (covers Gemma 4) https://ai.google.dev/gemma/docs/core Gemma — Google DeepMind family page https://deepmind.google/models/gemma/ Gemma 3 Technical Report https://arxiv.org/html/2503.19786v1 OLMo — Ai2 https://allenai.org/olmo AllenAI on Hugging Face (OLMo 3, Molmo 2) https://huggingface.co/allenai Molmo — Ai2 family page https://molmo.allenai.org/ OpenBMB on Hugging Face (MiniCPM-V, MiniCPM-o) https://huggingface.co/openbmb Mistral Large 3 announcement https://mistral.ai/news/mistral-3 Mistral Large 3 Hugging Face https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512 OpenAI gpt-oss announcement https://openai.com/index/introducing-gpt-oss/ gpt-oss model card https://openai.com/index/gpt-oss-model-card/ SDXL paper https://arxiv.org/abs/2307.01952 FLUX.1-schnell model page https://huggingface.co/black-forest-labs/FLUX.1-schnell Black Forest Labs (FLUX family) https://huggingface.co/black-forest-labs Open-Sora repository https://github.com/hpcaitech/Open-Sora Wan2.1 repository https://github.com/Wan-Video/Wan2.1 FramePack repository https://github.com/lllyasviel/FramePack Whisper repository https://github.com/openai/whisper Chatterbox — Resemble AI https://github.com/resemble-ai/chatterbox ControlNet repository https://github.com/lllyasviel/ControlNet SAM 2 repository https://github.com/facebookresearch/sam2 LaMa inpainting repository https://github.com/advimman/lama llama.cpp quantization memory reference https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md vLLM — high-throughput LLM serving https://github.com/vllm-project/vllm SGLang — fast LLM serving https://github.com/sgl-project/sglang Ollama — local model runner https://ollama.com/ ComfyUI — visual workflow builder https://github.com/comfyanonymous/ComfyUI