Omid Taheri
Omid Taheri
Home
News
Publications
Contact
CV
Thesis
Light
Dark
Automatic
1
NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models
Abstract Acquiring physically plausible motor skills across diverse and unconventional morphologies—from humanoids to ants—is crucial for robotics and simulation. We introduce No-data Imitation Learning (NIL), which: Generates a reference video with a pretrained video diffusion model from a single simulation frame + text prompt.
Mert Albaba
,
Chenhao Li
,
Markos Diomataris
,
Omid Taheri
,
Andreas Krause
,
Michael Black
Cite
DOI
PDF
arXiv
CVPR 2026
HaPTIC: Predicting 4D Hand Trajectory from Monocular Videos
Abstract We present HaPTIC, an approach that infers coherent 4D hand trajectories from monocular videos. Current video-based hand pose reconstruction methods primarily focus on improving frame-wise 3D pose using adjacent frames rather than studying consistent 4D hand trajectories in space.
Yufei Ye
,
Yao Feng
,
Omid Taheri
,
Haiwen Feng
,
Shubham Tulsiani
,
Michael J. Black
Cite
Project
DOI
arXiv
Code
3DV 2026
NGL-Prompter: Training-Free Sewing Pattern Estimation from a Single Image
Abstract Estimating sewing patterns from images is a practical approach for creating high-quality 3D garments. Due to the lack of real-world pattern-image paired data, prior approaches fine-tune large vision-language models (VLMs) on synthetic garment datasets, limiting their generalization to real-world images.
Anna Badalyan
,
Pratheba Selvaraju
,
Giorgio Becherini
,
Omid Taheri
,
Victoria Fernandez Abrevaya
,
Michael Black
arXiv
PDF
CLUTCH: Contextualized Language Model for Unlocking Text-Conditioned Hand Motion Modelling in the Wild
Abstract Hands play a central role in daily life, yet modeling natural hand motions remains underexplored. Existing methods that tackle text-to-hand-motion generation or hand animation captioning rely on studio-captured datasets with limited actions and contexts, making them costly to scale to in-the-wild settings.
Balamurugan Thambiraja
,
Omid Taheri
,
Radek Danecek
,
Giorgio Becherini
,
Gerard Pons-Moll
,
Justus Thies
arXiv
PDF
Project
Code
ICLR 2026
FUSION: Full-Body Unified Motion Prior for Body and Hands via Diffusion
Abstract Hands are central to interacting with our surroundings and conveying gestures, making their inclusion essential for full-body motion synthesis. Yet existing methods either ignore hands entirely or operate under highly constrained conditions.
Enes Duran
,
Nikos Athanasiou
,
Muhammed Kocabas
,
Michael J. Black
,
Omid Taheri
arXiv
PDF
Moving by Looking: Towards Vision-Driven Avatar Motion Generation
Abstract Human-like motion requires human-like perception. We introduce CLOPS, the first human avatar that solely uses egocentric vision to perceive its surroundings and navigate 3D scenes. Rather than relying on task-specific perception methods or privileged state information, CLOPS argues that human-like avatar behavior requires human-like perception.
Markos Diomataris
,
Berat Mert Albaba
,
Giorgio Becherini
,
Partha Ghosh
,
Omid Taheri
,
Michael J. Black
arXiv
PDF
Video
Project
Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions
Abstract While current general-purpose 3D human models (e.g., SMPL-X) efficiently represent accurate human shape and pose, they lack the ability to physically interact with the environment due to their kinematic nature.
Li Siyao
,
Yao Feng
,
Omid Taheri
,
Chen Change Loy
,
Michael J. Black
arXiv
PDF
Project
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Abstract We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate human-object joint reconstruction in 3D. This is challenging due to occlusions, depth ambiguities, and widely varying object shapes.
Sai Kumar Dwivedi
,
Dimitrije Antić
,
Shashank Tripathi
,
Omid Taheri
,
Cordelia Schmid
,
Michael J. Black
,
Dimitrios Tzionas
Cite
PDF
arXiv
Video
Code
CVPR 2025
Humanity's Last Exam: A Multi-Modal Benchmark at the Frontier of Human Knowledge
Abstract Benchmarks are essential for tracking rapid LLM progress—but today’s models exceed 90% on tasks like MMLU, saturating existing exams. We introduce Humanity’s Last Exam (HLE), a multi-modal, closed-ended benchmark spanning 2,500 questions across 100+ subjects at the frontier of human knowledge.
Omid Taheri
,
& Many Others
Cite
PDF
arXiv
Data
Code
SEAL LLM Leaderboards
CHOIR: A Versatile and Differentiable Hand-Object Interaction Representation
Abstract Synthesizing accurate hand–object interactions (HOI) is critical for AR/VR and vision tasks. Existing dense–correspondence methods improve contact fidelity but lack full differentiability or generality. We propose CHOIR, a versatile, fully differentiable interaction field:
Théo Morales
,
Omid Taheri
,
Gerard Lacey
Cite
Paper
arXiv
Code
»
Cite
×