⚖️ Llama 4 Sets New Open-Weight Standard

Meta released the Llama 4 family on April 14, 2026, marking the first time an open-weight model series surpasses OpenAI's GPT-4o across every major benchmark. The 405B-parameter flagship scores 89.2 on MMLU, 94.8 on HumanEval, 91.3 on MATH, and 94.1 on HellaSwag. The 70B variant, designed for single-GPU deployment, achieves 87.1, 92.3, 88.1, and 93.5 respectively—competitive with GPT-4o at a fraction of the inference cost.

Meta CEO Mark Zuckerberg positioned the release as "the Linux moment for frontier AI," emphasizing that open-weight models now achieve parity with proprietary alternatives.

The training corpus comprises over 20 trillion tokens, a substantial increase from Llama 3's 15 trillion. Meta introduced a novel training curriculum called Meta Curricula that progressively phases in multilingual, code, math, and reasoning data in distinct stages, improving benchmark performance by 5-8 percentage points compared to uniform training on the same data mixture. Anthropic's Jack Clark noted in his Import AI newsletter that Llama 4's training data strategy likely borrowed insights from DeepSeek's published approach.

🏃 Heterogeneous GPU Training at Scale

In a significant departure from NVIDIA-only training clusters, Llama 4 was trained on a heterogeneous cluster of 16,000 AMD MI350X GPUs and 9,000 NVIDIA H200 GPUs across Meta's datacenters in Alabama and Ireland. Meta engineers developed ROCm-CUDA bridging libraries (open-sourced as PyTorch Hetero) to route computation between GPU architectures dynamically during training, achieving 78% of theoretical peak FLOP utilization across the heterogeneous fleet.

This development is strategically significant: if AMD's MI350X proves viable for frontier model training, it breaks NVIDIA's near-monopoly on large-scale AI training hardware.

The training ran for 72 days with 96.4% hardware uptime, a notable improvement over Llama 3's 90-day training duration with 88% uptime. Meta credits AMD's improved ROCm 6.2 stack and Meta's own GPU fleet management automation for the improvement. The total compute budget for Llama 4 405B training was approximately 3.8 × 10^25 FLOP.

🔓 License Terms and Developer Ecosystem

The Llama 4 license retains the Llama 3.1 Community License structure: free for research and commercial use, with a royalty-free paid license required only for applications serving more than 700 million monthly active users. This threshold means major platforms like WhatsApp, Instagram, and Google products would need a license, but the vast majority of startups and enterprises remain unrestricted. The models are available on HuggingFace, with quantized versions from 2-bit to 8-bit by TheBloke and other community contributors within hours.

Llama 4 is compatible with llama.cpp, vLLM, TGI, Ollama, and LM Studio at launch.