Yan Wang
- Papers
- 53
Cite
Notes
Only stored in your browser.
Authored papers
53Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration
arXiv 2026
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
arXiv 2026
FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
arXiv 2026
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
arXiv 2026
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents
arXiv 2026
Making Reconstruction FID Predictive of Diffusion Generation FID
arXiv 2026
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context
arXiv 2026
Training-Free Vector Quantization via Gaussian VAEs
arXiv 2025
Free(): Learning to Forget in Malloc-Only Reasoning Models
arXiv 2026
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection
arXiv 2026
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories
arXiv 2026
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation
arXiv 2026
Ebisu: Benchmarking Large Language Models in Japanese Finance
arXiv 2026
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
arXiv 2025
LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs
arXiv 2025
FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information
arXiv 2025
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
arXiv 2025
DeepRFTv2: Kernel-level Learning for Image Deblurring
arXiv 2025
WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation
arXiv 2025
The End of Manual Decoding: Towards Truly End-to-End Language Models
arXiv 2025
Can Test-Time Scaling Improve World Foundation Model?
arXiv 2025
MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation
arXiv 2025
FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs
arXiv 2025
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance
arXiv 2025
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
CVPR 2025 1
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
arXiv 2024
CogVLM2: Visual Language Models for Image and Video Understanding
arXiv 2024
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
arXiv 2024
CAMixerSR: Only Details Need More "Attention"
CVPR 2024 1
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
arXiv 2024
A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis
arXiv 2024
Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models
arXiv 2024
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
arXiv 2024
xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart
arXiv 2024
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
arXiv 2024
Boosting Neural Representations for Videos with a Conditional Decoder
CVPR 2024 1
Extrapolated Urban View Synthesis Benchmark
ICCV 2025
Block-Attention for Efficient RAG
arXiv 2024
Idempotence and Perceptual Image Compression
arXiv 2024
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
arXiv 2023
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
CVPR 2023 1
EasyTPP: Towards Open Benchmarking Temporal Point Processes
arXiv 2023
An Embodied Generalist Agent in 3D World
arXiv 2023
CogAgent: A Visual Language Model for GUI Agents
CVPR 2024 1
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
arXiv 2023
VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection
arXiv 2023
AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception
ICCV 2023 1
A Contrastive Framework for Neural Text Generation
arXiv 2022
Large Language Models Meet Harry Potter: A Bilingual Dataset for Aligning Dialogue Agents with Characters
arXiv 2022
Bit Allocation using Optimization
arXiv 2022
TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
arXiv 2021
OMPQ: Orthogonal Mixed Precision Quantization
arXiv 2021
The NANOGrav Nine-year Data Set: Limits on the Isotropic Stochastic Gravitational Wave Background
arXiv 2015
Affiliations
Frequent co-authors
10from 53 papers