Liang Wang
- Papers
- 47
Cite
Notes
Only stored in your browser.
Authored papers
47Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation
arXiv 2026
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation
arXiv 2026
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing
arXiv 2026
FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Language Models
arXiv 2026
Aligning Multimodal LLM with Human Preference: A Survey
arXiv 2025
Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
arXiv 2025
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
arXiv 2025
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
arXiv 2025
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data
arXiv 2025
Diffusion Models for Molecules: A Survey of Methods and Tasks
arXiv 2025
SimScale: Learning to Drive via Real-World Simulation at Scale
arXiv 2025
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
arXiv 2025
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
arXiv 2025
Thyme: Think Beyond Images
arXiv 2025
MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings
arXiv 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
arXiv 2025
Variational Reasoning for Language Models
arXiv 2025
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
arXiv 2025
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
arXiv 2025
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs
arXiv 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
arXiv 2025
OpenGPT-4o-Image: A Comprehensive Dataset for Advanced Image Generation and Editing
arXiv 2025
Reinforcing General Reasoning without Verifiers
arXiv 2025
PoisonArena: Uncovering Competing Poisoning Attacks in Retrieval-Augmented Generation
arXiv 2025
BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra
arXiv 2024
Generative Representational Instruction Tuning
arXiv 2024
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
arXiv 2024
Debiasing Multimodal Large Language Models
arXiv 2024
VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
arXiv 2024
PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts
arXiv 2024
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
arXiv 2024
Little Giants: Synthesizing High-Quality Embedding Data at Scale
arXiv 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
arXiv 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
arXiv 2024
LongEmbed: Extending Embedding Models for Long Context Retrieval
arXiv 2024
Inference with Reference: Lossless Acceleration of Large Language Models
arXiv 2023
GSLB: The Graph Structure Learning Benchmark
gslb-the-graph-structure-learning-benchmark
AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation
arXiv 2023
EX-FEVER: A Dataset for Multi-hop Explainable Fact Verification
arXiv 2023
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
arXiv 2023
OneNet: Enhancing Time Series Forecasting Models under Concept Drift by Online Ensembling
onenet-enhancing-time-series-forecasting
Exploring Model Dynamics for Accumulative Poisoning Discovery
arXiv 2023
Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval
arXiv 2022
BEVBert: Multimodal Map Pre-training for Language-guided Navigation
ICCV 2023 1
Relation-aware Heterogeneous Graph for User Profiling
arXiv 2021
Deep Graph Contrastive Representation Learning
arXiv 2020
Bifurcated backbone strategy for RGB-D salient object detection
arXiv 2020
Affiliations
Frequent co-authors
10from 47 papers