0

Jan Kautz

Papers
49

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
49papers

Authored papers

49

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

arXiv 2026

2026

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

arXiv 2026

2026

Learning to Discover at Test Time

arXiv 2026

2026

World Action Models are Zero-shot Policies

arXiv 2026

2026

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

arXiv 2026

2026

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

arXiv 2026

2026

C-RADIOv4 (Tech Report)

arXiv 2026

2026

NitroGen: An Open Foundation Model for Generalist Gaming Agents

arXiv 2026

2026

Scaling RL to Long Videos

arXiv 2025

2025

FoundationStereo: Zero-Shot Stereo Matching

CVPR 2025 1

2025

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

arXiv 2025

2025

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

arXiv 2025

2025

Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

arXiv 2025

2025

Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

arXiv 2025

2025

One-Minute Video Generation with Test-Time Training

CVPR 2025 1

2025

FeatSharp: Your Vision Model Features, Sharper

arXiv 2025

2025

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

arXiv 2025

2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

arXiv 2025

2025

RLP: Reinforcement as a Pretraining Objective

arXiv 2025

2025

Scaling Vision Pre-Training to 4K Resolution

CVPR 2025 1

2025

LongMamba: Enhancing Mamba's Long Context Capabilities via Training-Free Receptive Field Enlargement

arXiv 2025

2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

arXiv 2025

2025

An Empirical Study of Mamba-based Language Models

arXiv 2024

2024

MambaVision: A Hybrid Mamba-Transformer Vision Backbone

CVPR 2025 1

2024

Gated Delta Networks: Improving Mamba2 with Delta Rule

arXiv 2024

2024

NVILA: Efficient Frontier Visual Language Models

CVPR 2025 1

2024

RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models

arXiv 2024

2024

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning

arXiv 2024

2024

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

arXiv 2024

2024

Hymba: A Hybrid-head Architecture for Small Language Models

arXiv 2024

2024

EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation

arXiv 2024

2024

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

arXiv 2024

2024

LITA: Language Instructed Temporal-Localization Assistant

arXiv 2024

2024

Compact Language Models via Pruning and Knowledge Distillation

arXiv 2024

2024

VILA: On Pre-training for Visual Language Models

CVPR 2024 1

2023

COLMAP-Free 3D Gaussian Splatting

CVPR 2024 1

2023

DiffiT: Diffusion Vision Transformers for Image Generation

arXiv 2023

2023

BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

CVPR 2023 1

2023

A Variational Perspective on Solving Inverse Problems with Diffusion Models

arXiv 2023

2023

FasterViT: Fast Vision Transformers with Hierarchical Attention

arXiv 2023

2023

Global Context Vision Transformers

arXiv 2022

2022

GroupViT: Semantic Segmentation Emerges from Text Supervision

CVPR 2022 1

2022

Two-shot Spatially-varying BRDF and Shape Estimation

two-shot-spatially-varying-brdf-and-shape-1

2020

Few-Shot Unsupervised Image-to-Image Translation

few-shot-unsupervised-image-to-image-1

2019

Joint-task Self-supervised Learning for Temporal Correspondence

joint-task-self-supervised-learning-for-1

2019

A Closed-form Solution to Photorealistic Image Stylization

a-closed-form-solution-to-photorealistic-1

2018

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

high-resolution-image-synthesis-and-semantic-1

2017

MoCoGAN: Decomposing Motion and Content for Video Generation

mocogan-decomposing-motion-and-content-for-1

2017

Geometry-Aware Learning of Maps for Camera Localization

geometry-aware-learning-of-maps-for-camera-1

2017

Affiliations

No known affiliations.

Frequent co-authors

10

from 49 papers