LLM

Humanity's Last Exam: A Multi-Modal Benchmark at the Frontier of Human Knowledge

Abstract Benchmarks are essential for tracking rapid LLM progress—but today’s models exceed 90% on tasks like MMLU, saturating existing exams. We introduce Humanity’s Last Exam (HLE), a multi-modal, closed-ended benchmark spanning 2,500 questions across 100+ subjects at the frontier of human knowledge.

Omid Taheri, & Many Others

Humanity's Last Exam: A Multi-Modal Benchmark at the Frontier of Human Knowledge