Jan Kautz
- Papers
- 49
Cite
Notes
Only stored in your browser.
Authored papers
49ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
arXiv 2026
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
arXiv 2026
Learning to Discover at Test Time
arXiv 2026
World Action Models are Zero-shot Policies
arXiv 2026
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing
arXiv 2026
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
arXiv 2026
C-RADIOv4 (Tech Report)
arXiv 2026
NitroGen: An Open Foundation Model for Generalist Gaming Agents
arXiv 2026
Scaling RL to Long Videos
arXiv 2025
FoundationStereo: Zero-Shot Stereo Matching
CVPR 2025 1
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
arXiv 2025
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
arXiv 2025
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
arXiv 2025
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
arXiv 2025
One-Minute Video Generation with Test-Time Training
CVPR 2025 1
FeatSharp: Your Vision Model Features, Sharper
arXiv 2025
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
arXiv 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
arXiv 2025
RLP: Reinforcement as a Pretraining Objective
arXiv 2025
Scaling Vision Pre-Training to 4K Resolution
CVPR 2025 1
LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement
arXiv 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
arXiv 2025
An Empirical Study of Mamba-based Language Models
arXiv 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
CVPR 2025 1
Gated Delta Networks: Improving Mamba2 with Delta Rule
arXiv 2024
NVILA: Efficient Frontier Visual Language Models
CVPR 2025 1
RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models
arXiv 2024
OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
arXiv 2024
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
arXiv 2024
Hymba: A Hybrid-head Architecture for Small Language Models
arXiv 2024
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
arXiv 2024
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
arXiv 2024
LITA: Language Instructed Temporal-Localization Assistant
arXiv 2024
Compact Language Models via Pruning and Knowledge Distillation
arXiv 2024
VILA: On Pre-training for Visual Language Models
CVPR 2024 1
COLMAP-Free 3D Gaussian Splatting
CVPR 2024 1
DiffiT: Diffusion Vision Transformers for Image Generation
arXiv 2023
BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
CVPR 2023 1
A Variational Perspective on Solving Inverse Problems with Diffusion Models
arXiv 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
arXiv 2023
Global Context Vision Transformers
arXiv 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
CVPR 2022 1
Two-shot Spatially-varying BRDF and Shape Estimation
two-shot-spatially-varying-brdf-and-shape-1
Few-Shot Unsupervised Image-to-Image Translation
few-shot-unsupervised-image-to-image-1
Joint-task Self-supervised Learning for Temporal Correspondence
joint-task-self-supervised-learning-for-1
A Closed-form Solution to Photorealistic Image Stylization
a-closed-form-solution-to-photorealistic-1
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
high-resolution-image-synthesis-and-semantic-1
MoCoGAN: Decomposing Motion and Content for Video Generation
mocogan-decomposing-motion-and-content-for-1
Geometry-Aware Learning of Maps for Camera Localization
geometry-aware-learning-of-maps-for-camera-1
Affiliations
Frequent co-authors
10from 49 papers