0

Tat-Seng Chua

Papers
64

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
64papers

Authored papers

64

AI for Auto-Research: Roadmap & User Guide

arXiv 2026

2026

JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation

arXiv 2026

2026

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

arXiv 2026

2026

AnyEdit: Edit Any Knowledge Encoded in Language Models

arXiv 2025

2025

Reinforcing Video Reasoning with Focused Thinking

arXiv 2025

2025

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

arXiv 2025

2025

Order-agnostic Identifier for Large Language Model-based Generative Recommendation

arXiv 2025

2025

WEAVE: Unleashing and Benchmarking the In-context Interleaved Comprehension and Generation

arXiv 2025

2025

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

arXiv 2025

2025

VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation

arXiv 2025

2025

RLPR: Extrapolating RLVR to General Domains without Verifiers

arXiv 2025

2025

RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards

arXiv 2025

2025

An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes

arXiv 2025

2025

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

arXiv 2025

2025

DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization

arXiv 2025

2025

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

arXiv 2025

2025

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

arXiv 2025

2025

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

CVPR 2025 1

2025

Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization

arXiv 2025

2025

On Path to Multimodal Generalist: General-Level and General-Bench

arXiv 2025

2025

FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models

arXiv 2025

2025

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

CVPR 2025 1

2024

AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models

arXiv 2024

2024

Learnable Item Tokenization for Generative Recommendation

arXiv 2024

2024

GraphEdit: Large Language Models for Graph Structure Learning

arXiv 2024

2024

Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence

arXiv 2024

2024

Language Representations Can be What Recommenders Need: Findings and Potentials

arXiv 2024

2024

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

arXiv 2024

2024

Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models

arXiv 2024

2024

Towards Semantic Equivalence of Tokenization in Multimodal LLM

arXiv 2024

2024

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

arXiv 2024

2024

ExpLLM: Towards Chain of Thought for Facial Expression Recognition

arXiv 2024

2024

Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation

arXiv 2024

2024

Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations

arXiv 2024

2024

Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

arXiv 2024

2024

Towards 3D Molecule-Text Interpretation in Language Models

arXiv 2024

2024

Data-efficient Fine-tuning for LLM-based Recommendation

arXiv 2024

2024

ReactXT: Understanding Molecular "Reaction-ship" via Reaction-Contextualized Molecule-Text Pretraining

arXiv 2024

2024

Ask-before-Plan: Proactive Language Agents for Real-World Planning

arXiv 2024

2024

A Picture Is Worth a Graph: A Blueprint Debate Paradigm for Multimodal Reasoning

arXiv 2024

2024

On the Multi-turn Instruction Following for Conversational Web Agents

arXiv 2024

2024

NExT-GPT: Any-to-Any Multimodal LLM

arXiv 2023

2023

LasUIE: Unifying Information Extraction with Latent Adaptive Structure-aware Generative Language Model

arXiv 2023

2023

MolCA: Molecular Graph-Language Modeling with Cross-Modal Projector and Uni-Modal Adapter

arXiv 2023

2023

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions

arXiv 2023

2023

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

CVPR 2024 1

2023

VPGTrans: Transfer Visual Prompt Generator across LLMs

NeurIPS 2023 11

2023

Reasoning Implicit Sentiment with Chain-of-Thought Prompting

arXiv 2023

2023

Can I Trust Your Answer? Visually Grounded Video Question Answering

CVPR 2024 1

2023

NExT-Chat: An LMM for Chat, Detection and Segmentation

arXiv 2023

2023

Generative Recommendation: Towards Next-generation Recommender Paradigm

arXiv 2023

2023

Search-in-the-Chain: Interactively Enhancing Large Language Models with Search for Knowledge-intensive Tasks

arXiv 2023

2023

Progressive Text-to-3D Generation for Automatic 3D Prototyping

arXiv 2023

2023

Generating Visual Spatial Description via Holistic 3D Scene Understanding

arXiv 2023

2023

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

arXiv 2023

2023

Leveraging Multimodal Features and Item-level User Feedback for Bundle Construction

arXiv 2023

2023

Prompting and Evaluating Large Language Models for Proactive Dialogues: Clarification, Target-guided, and Non-collaboration

arXiv 2023

2023

Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination

arXiv 2023

2023

Discovering Spatio-Temporal Rationales for Video Question Answering

ICCV 2023 1

2023

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization

arXiv 2022

2022

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

ACL 2021 5

2021

NExT-QA:Next Phase of Question-Answering to Explaining Temporal Actions

arXiv 2021

2021

CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models

cpt-colorful-prompt-tuning-for-pre-trained-1

2021

KGAT: Knowledge Graph Attention Network for Recommendation

arXiv 2019

2019

Affiliations

No known affiliations.

Frequent co-authors

10

from 64 papers