Philip Torr
- Papers
- 55
Cite
Notes
Only stored in your browser.
Authored papers
55World Model for Robot Learning: A Comprehensive Survey
arXiv 2026
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
arXiv 2026
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
arXiv 2026
Forecasting Scientific Progress with Artificial Intelligence
arXiv 2026
HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification
arXiv 2026
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning
arXiv 2026
StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction
arXiv 2026
ActionParty: Multi-Subject Action Binding in Generative Video Games
arXiv 2026
Code2World: A GUI World Model via Renderable Code Generation
arXiv 2026
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers
arXiv 2025
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
arXiv 2025
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
arXiv 2025
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
arXiv 2025
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
arXiv 2025
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
arXiv 2025
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?
arXiv 2025
Computer-Use Agents as Judges for Generative User Interface
arXiv 2025
Interleaving Reasoning for Better Text-to-Image Generation
arXiv 2025
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers
arXiv 2025
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards
arXiv 2025
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
arXiv 2025
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
ICCV 2025
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
arXiv 2025
SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines
arXiv 2025
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
ICCV 2025
Permission Manifests for Web Agents
arXiv 2025
From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images
arXiv 2025
OASIS: Open Agent Social Interaction Simulations with One Million Agents
arXiv 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
arXiv 2024
Learning Camera Movement Control from Real-World Drone Videos
arXiv 2024
A Scalable Communication Protocol for Networks of Large Language Models
arXiv 2024
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System
arXiv 2024
Can Large Language Model Agents Simulate Human Trust Behavior?
arXiv 2024
Towards Interpreting Visual Information Processing in Vision-Language Models
arXiv 2024
Can Editing LLMs Inject Harm?
arXiv 2024
Corrective Machine Unlearning
arXiv 2024
Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
arXiv 2024
Efficient Lifelong Model Evaluation in an Era of Rapid Progress
arXiv 2024
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?
arXiv 2024
SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?
arXiv 2024
Video Motion Transfer with Diffusion Transformers
CVPR 2025 1
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
arXiv 2024
MatchDiffusion: Training-free Generation of Match-cuts
ICCV 2025
Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image
arXiv 2024
Multimodal Pragmatic Jailbreak on Text-to-image Models
arXiv 2024
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders
arXiv 2024
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
arXiv 2024
Online Continual Learning Without the Storage Constraint
arXiv 2023
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models
arXiv 2023
PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction
arXiv 2023
Graph Inductive Biases in Transformers without Message Passing
arXiv 2023
Influencer Backdoor Attack on Semantic Segmentation
arXiv 2023
Interpreting Learned Feedback Patterns in Large Language Models
arXiv 2023
Is synthetic data from generative models ready for image recognition?
arXiv 2022
TransMix: Attend to Mix for Vision Transformers
CVPR 2022 1
Affiliations
Frequent co-authors
10from 55 papers