0

Yan Wang

Papers
53

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
53papers

Authored papers

53

Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration

arXiv 2026

2026

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

arXiv 2026

2026

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

arXiv 2026

2026

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

arXiv 2026

2026

ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

arXiv 2026

2026

Making Reconstruction FID Predictive of Diffusion Generation FID

arXiv 2026

2026

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

arXiv 2026

2026

Training-Free Vector Quantization via Gaussian VAEs

arXiv 2025

2026

Free(): Learning to Forget in Malloc-Only Reasoning Models

arXiv 2026

2026

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

arXiv 2026

2026

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

arXiv 2026

2026

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

arXiv 2026

2026

Ebisu: Benchmarking Large Language Models in Japanese Finance

arXiv 2026

2026

SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation

arXiv 2025

2025

LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs

arXiv 2025

2025

FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information

arXiv 2025

2025

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

arXiv 2025

2025

DeepRFTv2: Kernel-level Learning for Image Deblurring

arXiv 2025

2025

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

arXiv 2025

2025

The End of Manual Decoding: Towards Truly End-to-End Language Models

arXiv 2025

2025

Can Test-Time Scaling Improve World Foundation Model?

arXiv 2025

2025

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

arXiv 2025

2025

FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs

arXiv 2025

2025

Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance

arXiv 2025

2025

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

CVPR 2025 1

2025

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

arXiv 2024

2024

CogVLM2: Visual Language Models for Image and Video Understanding

arXiv 2024

2024

GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting

arXiv 2024

2024

CAMixerSR: Only Details Need More "Attention"

CVPR 2024 1

2024

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models

arXiv 2024

2024

A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis

arXiv 2024

2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models

arXiv 2024

2024

The Oscars of AI Theater: A Survey on Role-Playing with Language Models

arXiv 2024

2024

xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

arXiv 2024

2024

Asynchronous Large Language Model Enhanced Planner for Autonomous Driving

arXiv 2024

2024

Boosting Neural Representations for Videos with a Conditional Decoder

CVPR 2024 1

2024

Extrapolated Urban View Synthesis Benchmark

ICCV 2025

2024

Block-Attention for Efficient RAG

arXiv 2024

2024

Idempotence and Perceptual Image Compression

arXiv 2024

2024

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

arXiv 2023

2023

Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation

CVPR 2023 1

2023

EasyTPP: Towards Open Benchmarking Temporal Point Processes

arXiv 2023

2023

An Embodied Generalist Agent in 3D World

arXiv 2023

2023

CogAgent: A Visual Language Model for GUI Agents

CVPR 2024 1

2023

Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

arXiv 2023

2023

VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection

arXiv 2023

2023

AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving Perception

ICCV 2023 1

2023

A Contrastive Framework for Neural Text Generation

arXiv 2022

2022

Large Language Models Meet Harry Potter: A Bilingual Dataset for Aligning Dialogue Agents with Characters

arXiv 2022

2022

Bit Allocation using Optimization

arXiv 2022

2022

TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

arXiv 2021

2021

OMPQ: Orthogonal Mixed Precision Quantization

arXiv 2021

2021

The NANOGrav Nine-year Data Set: Limits on the Isotropic Stochastic Gravitational Wave Background

arXiv 2015

2015

Affiliations

No known affiliations.

Frequent co-authors

10

from 53 papers