Papers

Trending research and the full catalog - each paper linked to the benchmarks, methods, and models it introduces.

Filtered by domain: Video generationClear

Cosmos 3: Omnimodal World Models for Physical AI

1 Jun 2026

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture.

Image Understanding Language Modeling Omni models Video generation

11k1.4/h

LaWAM: Latent World Action Models for Efficient Dynamics-Aware Robot Policies

14 Jun 2026

Vision-Language-Action models (VLAs) leverage large-scale vision-language pretraining for semantic robot control, but often lack explicit foresight into how robot actions change the scene.

Robotics Video generation World Models

410.6/h

Physics Question Scene Graph: Fine-grained Evaluation of Physical Plausibility in Text-to-Video Generation

24 Jun 2026

Video generation models are increasingly capable of producing realistic videos, but they still struggle to generate videos that follow basic physical laws. Compounding this is a lack of reliable granular evaluation methods for localizing and specifying physical law violations in…

Image Understanding Language Modeling Video generation