Omid Taheri
Omid Taheri
Home
News
Positions
Publications
Contact
CV
Thesis
Light
Dark
Automatic
1
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Abstract We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate human-object joint reconstruction in 3D. This is challenging due to occlusions, depth ambiguities, and widely varying object shapes.
Sai Kumar Dwivedi
,
Dimitrije Antić
,
Shashank Tripathi
,
Omid Taheri
,
Cordelia Schmid
,
Michael J. Black
,
Dimitrios Tzionas
Cite
PDF
arXiv
Video
Code
CVPR 2025
Humanity's Last Exam: A Multi-Modal Benchmark at the Frontier of Human Knowledge
Abstract Benchmarks are essential for tracking rapid LLM progress—but today’s models exceed 90% on tasks like MMLU, saturating existing exams. We introduce Humanity’s Last Exam (HLE), a multi-modal, closed-ended benchmark spanning 2,500 questions across 100+ subjects at the frontier of human knowledge.
Omid Taheri
,
& Many Others
Cite
PDF
arXiv
Data
Code
SEAL LLM Leaderboards
NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models
Abstract Acquiring physically plausible motor skills across diverse and unconventional morphologies—from humanoids to ants—is crucial for robotics and simulation. We introduce No-data Imitation Learning (NIL), which: Generates a reference video with a pretrained video diffusion model from a single simulation frame + text prompt.
Mert Albaba
,
Chenhao Li
,
Markos Diomataris
,
Omid Taheri
,
Andreas Krause
,
Michael Black
Cite
DOI
PDF
arXiv
HaPTIC: Predicting 4D Hand Trajectory from Monocular Videos
Abstract We present HaPTIC, an approach that infers coherent 4D hand trajectories from monocular videos. Current video-based hand pose reconstruction methods primarily focus on improving frame-wise 3D pose using adjacent frames rather than studying consistent 4D hand trajectories in space.
Yufei Ye
,
Yao Feng
,
Omid Taheri
,
Haiwen Feng
,
Shubham Tulsiani
,
Michael J. Black
Cite
Project
DOI
arXiv
Code
CHOIR: A Versatile and Differentiable Hand-Object Interaction Representation
Abstract Synthesizing accurate hand–object interactions (HOI) is critical for AR/VR and vision tasks. Existing dense–correspondence methods improve contact fidelity but lack full differentiability or generality. We propose CHOIR, a versatile, fully differentiable interaction field:
Théo Morales
,
Omid Taheri
,
Gerard Lacey
Cite
Paper
arXiv
Code
CWGrasp: 3D Whole-Body Grasp Synthesis with Directional Controllability
Abstract Synthesizing 3D whole bodies that realistically grasp objects is crucial for animation, mixed reality, and robotics. Key challenges include natural coordination between hand, body, and environment, and the scarcity of training data.
Georgios Paschalidis
,
Romana Wilschut
,
Dimitrije Antić
,
Omid Taheri
,
Dimitrios Tzionas
Cite
arXiv
Video
Code
3DV 2025
HUMOS: Human Motion Model Conditioned on Body Shape
Abstract Generating realistic human motion is crucial for many computer vision and graphics applications. The rich diversity of human body shapes and sizes significantly influences how people move. However, existing motion models typically overlook these differences, using a normalized, average body instead.
Shashank Tripathi
,
Omid Taheri
,
Christoph Lassner
,
Michael Black
,
Daniel Holden
,
Carsten Stoll
Cite
PDF
arXiv
Video
Code
ECCV2024
WANDR: Intention-guided Human Motion Generation
Abstract Synthesizing natural human motions that enable a 3D human avatar to walk and reach for arbitrary goals in 3D space remains an unsolved problem with many applications. Existing methods (data-driven or using reinforcement learning) are limited in terms of generalization and motion naturalness.
Markos Diomataris
,
Nikos Athanasiou
,
Omid Taheri
,
Xi Wang
,
Otmar Hilliges
,
Michael J. Black
Cite
PDF
arXiv
Video
Code
Project
CVPR2024
GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency
Abstract Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment. Consequently, modeling realistic hand-object interactions, including the subtle motion of individual fingers, is critical for applications in computer graphics, computer vision, and mixed reality.
Omid Taheri
,
Yi Zhou
,
Dimitrios Tzionas
,
Yang Zhou
,
Duygu Ceylan
,
Soren Pirk
,
Michael J. Black
Cite
PDF
arXiv
Video
Code
Poster
Project
3DV2024
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
Abstract Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines.
Zicong Fan
,
Omid Taheri
,
Dimitrios Tzionas
,
Muhammed Kocabas
,
Manuel Kaufmann
,
Michael J. Black
,
Otmar Hilliges
Cite
PDF
arXiv
Video
Code/Competition
Data
Project
CVPR2023
»
Cite
×