0

Philip Torr

Papers
55

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
55papers

Authored papers

55

World Model for Robot Learning: A Comprehensive Survey

arXiv 2026

2026

Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond

arXiv 2026

2026

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

arXiv 2026

2026

Forecasting Scientific Progress with Artificial Intelligence

arXiv 2026

2026

HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification

arXiv 2026

2026

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

arXiv 2026

2026

StraTA: Incentivizing Agentic Reinforcement Learning with Strategic Trajectory Abstraction

arXiv 2026

2026

ActionParty: Multi-Subject Action Binding in Generative Video Games

arXiv 2026

2026

Code2World: A GUI World Model via Renderable Code Generation

arXiv 2026

2026

Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

arXiv 2025

2025

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

arXiv 2025

2025

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

arXiv 2025

2025

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

arXiv 2025

2025

EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

arXiv 2025

2025

VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

arXiv 2025

2025

Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?

arXiv 2025

2025

Computer-Use Agents as Judges for Generative User Interface

arXiv 2025

2025

Interleaving Reasoning for Better Text-to-Image Generation

arXiv 2025

2025

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

arXiv 2025

2025

CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards

arXiv 2025

2025

Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks

arXiv 2025

2025

VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory

ICCV 2025

2025

BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

arXiv 2025

2025

SciReasoner: Laying the Scientific Reasoning Ground Across Disciplines

arXiv 2025

2025

Stable Virtual Camera: Generative View Synthesis with Diffusion Models

ICCV 2025

2025

Permission Manifests for Web Agents

arXiv 2025

2025

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

arXiv 2025

2025

OASIS: Open Agent Social Interaction Simulations with One Million Agents

arXiv 2024

2024

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

arXiv 2024

2024

Learning Camera Movement Control from Real-World Drone Videos

arXiv 2024

2024

A Scalable Communication Protocol for Networks of Large Language Models

arXiv 2024

2024

Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System

arXiv 2024

2024

Can Large Language Model Agents Simulate Human Trust Behavior?

arXiv 2024

2024

Towards Interpreting Visual Information Processing in Vision-Language Models

arXiv 2024

2024

Can Editing LLMs Inject Harm?

arXiv 2024

2024

Corrective Machine Unlearning

arXiv 2024

2024

Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution

arXiv 2024

2024

Efficient Lifelong Model Evaluation in an Era of Rapid Progress

arXiv 2024

2024

Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

arXiv 2024

2024

SynthCLIP: Are We Ready for a Fully Synthetic CLIP Training?

arXiv 2024

2024

Video Motion Transfer with Diffusion Transformers

CVPR 2025 1

2024

Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images

arXiv 2024

2024

MatchDiffusion: Training-free Generation of Match-cuts

ICCV 2025

2024

Stop Reasoning! When Multimodal LLM with Chain-of-Thought Reasoning Meets Adversarial Image

arXiv 2024

2024

Multimodal Pragmatic Jailbreak on Text-to-image Models

arXiv 2024

2024

Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

arXiv 2024

2024

Semantic Score Distillation Sampling for Compositional Text-to-3D Generation

arXiv 2024

2024

Online Continual Learning Without the Storage Constraint

arXiv 2023

2023

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

arXiv 2023

2023

PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

arXiv 2023

2023

Graph Inductive Biases in Transformers without Message Passing

arXiv 2023

2023

Influencer Backdoor Attack on Semantic Segmentation

arXiv 2023

2023

Interpreting Learned Feedback Patterns in Large Language Models

arXiv 2023

2023

Is synthetic data from generative models ready for image recognition?

arXiv 2022

2022

TransMix: Attend to Mix for Vision Transformers

CVPR 2022 1

2021

Affiliations

No known affiliations.

Frequent co-authors

10

from 55 papers