Tat-Seng Chua
- Papers
- 64
Cite
Notes
Only stored in your browser.
Authored papers
64AI for Auto-Research: Roadmap & User Guide
arXiv 2026
JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation
arXiv 2026
VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions
arXiv 2026
AnyEdit: Edit Any Knowledge Encoded in Language Models
arXiv 2025
Reinforcing Video Reasoning with Focused Thinking
arXiv 2025
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
arXiv 2025
Order-agnostic Identifier for Large Language Model-based Generative Recommendation
arXiv 2025
WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation
arXiv 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
arXiv 2025
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation
arXiv 2025
RLPR: Extrapolating RLVR to General Domains without Verifiers
arXiv 2025
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
arXiv 2025
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
arXiv 2025
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
arXiv 2025
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization
arXiv 2025
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
arXiv 2025
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation
arXiv 2025
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
CVPR 2025 1
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
arXiv 2025
On Path to Multimodal Generalist: General-Level and General-Bench
arXiv 2025
FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models
arXiv 2025
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
CVPR 2025 1
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models
arXiv 2024
Learnable Item Tokenization for Generative Recommendation
arXiv 2024
GraphEdit: Large Language Models for Graph Structure Learning
arXiv 2024
Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence
arXiv 2024
Language Representations Can be What Recommenders Need: Findings and Potentials
arXiv 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
arXiv 2024
Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models
arXiv 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
arXiv 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
arXiv 2024
ExpLLM: Towards Chain of Thought for Facial Expression Recognition
arXiv 2024
Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation
arXiv 2024
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations
arXiv 2024
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue
arXiv 2024
Towards 3D Molecule-Text Interpretation in Language Models
arXiv 2024
Data-efficient Fine-tuning for LLM-based Recommendation
arXiv 2024
ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining
arXiv 2024
Ask-before-Plan: Proactive Language Agents for Real-World Planning
arXiv 2024
A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning
arXiv 2024
On the Multi-turn Instruction Following for Conversational Web Agents
arXiv 2024
NExT-GPT: Any-to-Any Multimodal LLM
arXiv 2023
LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model
arXiv 2023
MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter
arXiv 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
arXiv 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
CVPR 2024 1
VPGTrans: Transfer Visual Prompt Generator across LLMs
NeurIPS 2023 11
Reasoning Implicit Sentiment with Chain-of-Thought Prompting
arXiv 2023
Can I Trust Your Answer? Visually Grounded Video Question Answering
CVPR 2024 1
NExT-Chat: An LMM for Chat, Detection and Segmentation
arXiv 2023
Generative Recommendation: Towards Next-generation Recommender Paradigm
arXiv 2023
Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks
arXiv 2023
Progressive Text-to-3D Generation for Automatic 3D Prototyping
arXiv 2023
Generating Visual Spatial Description via Holistic 3D Scene Understanding
arXiv 2023
Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents
arXiv 2023
Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction
arXiv 2023
Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration
arXiv 2023
Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
arXiv 2023
Discovering Spatio-Temporal Rationales for Video Question Answering
ICCV 2023 1
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
arXiv 2022
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance
ACL 2021 5
NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions
arXiv 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
cpt-colorful-prompt-tuning-for-pre-trained-1
KGAT: Knowledge Graph Attention Network for Recommendation
arXiv 2019
Affiliations
Frequent co-authors
10from 64 papers