Omid Taheri
Omid Taheri
Home
News
Positions
Publications
Contact
CV
Thesis
Light
Dark
Automatic
Publications
Type
1
2
3
Date
2025
2024
2022
2021
2020
2019
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Abstract We introduce InteractVLM, a novel method to estimate 3D contact points on human bodies and objects from single in-the-wild images, enabling accurate human-object joint reconstruction in 3D. This is challenging due to occlusions, depth ambiguities, and widely varying object shapes.
Sai Kumar Dwivedi
,
Dimitrije Antić
,
Shashank Tripathi
,
Omid Taheri
,
Cordelia Schmid
,
Michael J. Black
,
Dimitrios Tzionas
Cite
PDF
arXiv
Video
Code
CVPR 2025
Humanity's Last Exam: A Multi-Modal Benchmark at the Frontier of Human Knowledge
Abstract Benchmarks are essential for tracking rapid LLM progress—but today’s models exceed 90% on tasks like MMLU, saturating existing exams. We introduce Humanity’s Last Exam (HLE), a multi-modal, closed-ended benchmark spanning 2,500 questions across 100+ subjects at the frontier of human knowledge.
Omid Taheri
,
& Many Others
Cite
PDF
arXiv
Data
Code
SEAL LLM Leaderboards
NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models
Abstract Acquiring physically plausible motor skills across diverse and unconventional morphologies—from humanoids to ants—is crucial for robotics and simulation. We introduce No-data Imitation Learning (NIL), which: Generates a reference video with a pretrained video diffusion model from a single simulation frame + text prompt.
Mert Albaba
,
Chenhao Li
,
Markos Diomataris
,
Omid Taheri
,
Andreas Krause
,
Michael Black
Cite
DOI
PDF
arXiv
HaPTIC: Predicting 4D Hand Trajectory from Monocular Videos
Abstract We present HaPTIC, an approach that infers coherent 4D hand trajectories from monocular videos. Current video-based hand pose reconstruction methods primarily focus on improving frame-wise 3D pose using adjacent frames rather than studying consistent 4D hand trajectories in space.
Yufei Ye
,
Yao Feng
,
Omid Taheri
,
Haiwen Feng
,
Shubham Tulsiani
,
Michael J. Black
Cite
Project
DOI
arXiv
Code
CHOIR: A Versatile and Differentiable Hand-Object Interaction Representation
Abstract Synthesizing accurate hand–object interactions (HOI) is critical for AR/VR and vision tasks. Existing dense–correspondence methods improve contact fidelity but lack full differentiability or generality. We propose CHOIR, a versatile, fully differentiable interaction field:
Théo Morales
,
Omid Taheri
,
Gerard Lacey
Cite
Paper
arXiv
Code
CWGrasp: 3D Whole-Body Grasp Synthesis with Directional Controllability
Abstract Synthesizing 3D whole bodies that realistically grasp objects is crucial for animation, mixed reality, and robotics. Key challenges include natural coordination between hand, body, and environment, and the scarcity of training data.
Georgios Paschalidis
,
Romana Wilschut
,
Dimitrije Antić
,
Omid Taheri
,
Dimitrios Tzionas
Cite
arXiv
Video
Code
3DV 2025
HUMOS: Human Motion Model Conditioned on Body Shape
Abstract Generating realistic human motion is crucial for many computer vision and graphics applications. The rich diversity of human body shapes and sizes significantly influences how people move. However, existing motion models typically overlook these differences, using a normalized, average body instead.
Shashank Tripathi
,
Omid Taheri
,
Christoph Lassner
,
Michael Black
,
Daniel Holden
,
Carsten Stoll
Cite
PDF
arXiv
Video
Code
ECCV2024
WANDR: Intention-guided Human Motion Generation
Abstract Synthesizing natural human motions that enable a 3D human avatar to walk and reach for arbitrary goals in 3D space remains an unsolved problem with many applications. Existing methods (data-driven or using reinforcement learning) are limited in terms of generalization and motion naturalness.
Markos Diomataris
,
Nikos Athanasiou
,
Omid Taheri
,
Xi Wang
,
Otmar Hilliges
,
Michael J. Black
Cite
PDF
arXiv
Video
Code
Project
CVPR2024
GRIP: Generating Interaction Poses Using Spatial Cues and Latent Consistency
Abstract Hands are dexterous and highly versatile manipulators that are central to how humans interact with objects and their environment. Consequently, modeling realistic hand-object interactions, including the subtle motion of individual fingers, is critical for applications in computer graphics, computer vision, and mixed reality.
Omid Taheri
,
Yi Zhou
,
Dimitrios Tzionas
,
Yang Zhou
,
Duygu Ceylan
,
Soren Pirk
,
Michael J. Black
Cite
PDF
arXiv
Video
Code
Poster
Project
3DV2024
InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction
Abstract Humans constantly interact with objects to accomplish tasks. To understand such interactions, computers need to reconstruct these in 3D from images of whole bodies manipulating objects, e.g., for grasping, moving, and using the latter.
Yinghao Huang
,
Omid Taheri
,
Michael J. Black
,
Dimitrios Tzionas
Cite
PDF
Paper
Video
Code
Data
Project
IJCV2024
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation
Abstract Humans intuitively understand that inanimate objects do not move by themselves, but that state changes are typically caused by human manipulation (e.g., the opening of a book). This is not yet the case for machines.
Zicong Fan
,
Omid Taheri
,
Dimitrios Tzionas
,
Muhammed Kocabas
,
Manuel Kaufmann
,
Michael J. Black
,
Otmar Hilliges
Cite
PDF
arXiv
Video
Code/Competition
Data
Project
CVPR2023
IPMAN: 3D Human Pose Estimation via Intuitive Physics
Abstract The estimation of 3D human body shape and pose from images has advanced rapidly. While the results are often well aligned with image features in the camera view, the 3D pose is often physically implausible; bodies lean, float, or penetrate the floor.
Shashank Tripathi
,
Lea Müller
,
Chun-Hao P. Huang
,
Omid Taheri
,
Michael Black
,
Dimitrios Tzionas
Cite
PDF
Video
Code
Data (MoYo)
Poster
Project
CVPR2023
InterCap: Joint Markerless 3D Tracking of Humans and Objects in Interaction
Abstract Humans constantly interact with objects to accomplish tasks. To understand such interactions, computers need to reconstruct these in 3D from images of whole bodies manipulating objects, e.g., for grasping, moving, and using the latter.
Yinghao Huang
,
Omid Taheri
,
Michael J. Black
,
Dimitrios Tzionas
Cite
PDF
arXiv
Video
Code
Data
Poster
Project
GCPR2022
GOAL: Generating 4D Whole-Body Motion for Hand-Object Grasping
Abstract Generating digital humans that move realistically has many applications and is widely studied, but existing methods focus on the major limbs of the body, ignoring the hands and head. Hands have been separately studied but the focus has been on generating realistic static grasps of objects.
Omid Taheri
,
Vasileios Choutas
,
Michael J. Black
,
Dimitrios Tzionas
Cite
PDF
arXiv
Video
Code
Poster
Project
CVPR2022
GRAB: A Dataset of Whole-Body Human Grasping of Objects
Abstract Training computers to understand, model, and synthesize human grasping requires a rich dataset containing complex 3D object shapes, detailed contact information, hand pose and shape, and the 3D body motion over time.
Omid Taheri
,
Nima Ghorbani
,
Michael J. Black
,
Dimitrios Tzionas
Cite
arXiv
Video
GRAB
GrabNet
Data
Project
ECCV2020
Human Leg Motion Tracking by Fusing IMUs and RGB Camera Data Using Extended Kalman Filter
Abstract Human motion capture is frequently used to study rehabilitation and clinical problems, as well as to provide realistic animation for the entertainment industry. IMU-based systems, as well as Markerbased motion tracking systems, are most popular methods to track movement due to their low cost of implementation and lightweight.
Omid Taheri
,
Hassan Salarieh
,
Aria Alasty
Cite
arXiv
Cite
×