Xin Wang
- Papers
- 50
Cite
Notes
Only stored in your browser.
Authored papers
50A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
arXiv 2026
AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer
arXiv 2026
AIDABench: AI Data Analytics Benchmark
arXiv 2026
OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs
arXiv 2026
ShowUI-Aloha: Human-Taught GUI Agent
arXiv 2026
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models
arXiv 2026
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents
arXiv 2026
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
arXiv 2025
Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs
arXiv 2025
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning
arXiv 2025
PhyX: Does Your Model Have the "Wits" for Physical Reasoning?
arXiv 2025
Safety at Scale: A Comprehensive Survey of Large Model Safety
arXiv 2025
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
CVPR 2025 1
RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation
arXiv 2025
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments
arXiv 2025
V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models
arXiv 2025
BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models
arXiv 2025
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
arXiv 2025
Technical Report of TeleChat2, TeleChat2.5 and T1
arXiv 2025
Post-training for Deepfake Speech Detection
arXiv 2025
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
arXiv 2025
Robust AI-Generated Face Detection with Imbalanced Data
arXiv 2025
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models
arXiv 2024
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding
arXiv 2024
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
arXiv 2024
When Do We Not Need Larger Vision Models?
arXiv 2024
ID-Animator: Zero-Shot Identity-Preserving Human Video Generation
arXiv 2024
AUITestAgent: Automatic Requirements Oriented GUI Function Testing
arXiv 2024
BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems
arXiv 2024
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
arXiv 2024
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
arXiv 2024
UU-Mamba: Uncertainty-aware U-Mamba for Cardiovascular Segmentation
arXiv 2024
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
arXiv 2024
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
arXiv 2024
Towards Multi-Source Retrieval-Augmented Generation via Synergizing Reasoning and Preference-Driven Retrieval
arXiv 2024
NESTFUL: A Benchmark for Evaluating LLMs on Nested Sequences of API Calls
arXiv 2024
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles
arXiv 2024
Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection
arXiv 2024
MEIT: Multi-Modal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
arXiv 2024
Efficient Large Language Models: A Survey
arXiv 2023
DisenBooth: Identity-Preserving Disentangled Tuning for Subject-Driven Text-to-Image Generation
arXiv 2023
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models
arXiv 2022
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
CVPR 2023 1
PanGu-Coder: Program Synthesis with Function-Level Language Modeling
arXiv 2022
Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances
arXiv 2021
Automated Machine Learning on Graphs: A Survey
arXiv 2021
VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images
arXiv 2020
Self-Supervised Learning for Contextualized Extractive Summarization
self-supervised-learning-for-contextualized-1
BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning
bdd100k-a-diverse-driving-dataset-for
SkipNet: Learning Dynamic Routing in Convolutional Networks
skipnet-learning-dynamic-routing-in-1
Affiliations
Frequent co-authors
10from 50 papers