Premium museum expansion · Provider catalog

Arcee AI

Trinity sparse MoE foundation line (Nano / Mini / Large), AFM compact foundation models, a broad post-train catalog (SuperNova, Virtuoso, …), plus MergeKit, Arcee Fusion, and Arcee Cloud. Catalog below mirrors docs.arcee.ai structure (March 2026 snapshot); product names and preview SKUs change faster than this page.

← AI Model Museum

Architecture & stack (overview)

Trinity models use a sparse mixture-of-experts (MoE) design with efficient attention for lower latency and cost at long context. Public docs cite AfmoeForCausalLM for Trinity MoE checkpoints; AFM-4.5B uses a dense decoder (ArceeForCausalLM) with GQA and ReLU² activations.

Arcee Fusion + MergeKit remain the open merge stack for composing models (e.g. SuperNova family). Post-train lines often distill or merge from Llama-class and Qwen-class teachers.

Official: arcee.ai · Docs: docs.arcee.ai · Hugging Face: arcee-ai · Chat: chat.arcee.ai

Model catalog (docs sidebar)

Foundation vs post-train groupings as listed in Arcee’s documentation.

Foundation models

Trinity Nano 6B Preview — smallest Trinity MoE; edge / local.
Trinity Mini 26B — mid MoE; cloud & on-prem.
AFM 4.5B — dense Arcee Foundation Model (non-Trinity).

Post-train models

Arcee-SuperNova Medius, Arcee-SuperNova-Lite, Arcee-SuperNova (70B-class releases on HF)
Arcee-Spark, Arcee-Miraj-Mini, Arcee-Agent, Arcee-Lite, Arcee-Nova, Arcee-Scribe, Arcee-SEC
Virtuoso-Lite, Virtuoso-Medium-v2
Caller (32B) — tool-use / orchestration (separate release track)

Trinity family (MoE)

All Trinity variants share the same capability profile in Arcee messaging; footprint (total vs active parameters) scales for edge vs cloud. Architecture: sparse MoE + efficient attention; Nano/Mini public specs: 128 experts, 8 active, 1 shared — training on large curated mixes (e.g. 10T tokens for Nano/Mini with math/code emphasis) on clusters such as 512× H200 (per docs).

Variant	Total / active	Context	Role
Trinity Nano	6B / 1B active	128K	Edge, on-device, offline, low-latency voice/UI loops.
Trinity Mini	26B / 3B active	128K	Cloud & on-prem (AWS, GCP, Azure, vLLM, SGLang, llama.cpp).
Trinity Large Preview	400B / 13B active	512K (frontier); preview API notes 128K @ 8-bit	Agent harnesses, toolchains, creative workloads; preview API / download flows.

License (Nano/Mini): Apache 2.0 in public docs. Inference: Transformers (main branch), vLLM, llama.cpp, LM Studio where supported.

AFM-4.5B (dense foundation)

Instruction-tuned 4.5B decoder-only model for enterprise and edge: ~8T tokens pretrain + midtraining (math/code), SFT + RL on preferences; data curation with Datology (per Arcee docs). Architecture: GQA, ReLU², ArceeForCausalLM. License: Apache 2.0.

Post-train highlights (selected)

Model	Notes (from Arcee docs)
Arcee-SuperNova Medius	~14B; Qwen2.5-14B lineage; multi-teacher distillation (e.g. Qwen2.5-72B-Instruct + Llama-3.1-405B-Instruct narratives).
Arcee-SuperNova-Lite	8B; Llama-3.1-8B-class; distilled from Llama-3.1-405B logits; EvolKit-style instruction data.
Arcee-SuperNova-v1 (70B)	Open merge / distillation flagship (earlier release wave).
Spark, Miraj-Mini, Agent, Lite, Nova, Scribe, SEC	Task-tuned and domain-specialized post-train SKUs — see per-model pages on docs.
Virtuoso-Lite, Virtuoso-Medium-v2	Virtuoso merge line; medium/lite footprints.
Caller (32B)	Qwen-2.5-32B–class tooling / API orchestration focus.

Capabilities (product)

Agent reliability — function selection, valid parameters, schema-true JSON, recovery when tools fail.
Coherent multi-turn — long sessions without re-explaining context.
Structured outputs — JSON schema adherence; native function calling and tool orchestration.
Same skills across sizes — move workloads edge ↔ cloud without rebuilding prompts.
Efficient attention — lower cost at long context vs dense baselines (Arcee messaging).
Context utilization — use large inputs for grounded answers.

Context & I/O

128K token context for Nano/Mini (and related API surfaces).
Structured outputs with JSON schema adherence.
Native function calling and tool orchestration.

Training philosophy (docs)

Curated data — quality filtering and classification pipelines.
Synthetic augmentation — tool calling, schema adherence, error recovery, preferences, voice-friendly styles.
Evaluation — tool reliability, long-turn coherence, structured-output accuracy.

Merge stack & cloud

MergeKit — open merge tooling; multi-GPU acceleration.
Arcee Fusion — selective fusion masks (importance scoring + thresholds).
Arcee Cloud — train, merge, deploy custom LLMs (SaaS).

Technical notes

Topic	Detail
MergeKit	Community merges; blog: MergeKit v0.1+ (Arcee Fusion + multi-GPU).
Licensing	Many open weights under Apache 2.0 — confirm per checkpoint on Hugging Face.
Inference stacks	vLLM, SGLang, llama.cpp, LM Studio, Transformers (per model).

Selected timeline (2024–2026)

2024: MergeKit + Arcee Fusion; AFM-4.5B; SuperNova / Virtuoso waves.
2025: Trinity Nano/Mini; expanded post-train; Caller; Arcee Cloud.
2026: Trinity Large preview tier; frontier MoE + agent positioning in market.

References

docs.arcee.ai (per-model pages: Trinity, AFM, SuperNova, …) · Arcee blog · Hugging Face · arcee-ai

Compiled from Arcee public docs and blog; not an official Arcee product sheet. Verify live model IDs, API limits, and licenses before production.