Yan Zhang
- Papers
- 39
Cite
Notes
Only stored in your browser.
Authored papers
39SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
arXiv 2026
UniVBench: Towards Unified Evaluation for Video Foundation Models
arXiv 2026
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
arXiv 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
arXiv 2025
When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
arXiv 2025
MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning
arXiv 2025
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray
arXiv 2025
VidText: Towards Comprehensive Evaluation for Video Text Understanding
arXiv 2025
Generating $π$-Functional Molecules Using STGG+ with Active Learning
arXiv 2025
Baichuan-Omni-1.5 Technical Report
arXiv 2025
UniFit: Towards Universal Virtual Try-on with MLLM-Guided Semantic Alignment
arXiv 2025
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
arXiv 2025
Graph Retrieval-Augmented Generation: A Survey
arXiv 2024
Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
arXiv 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
arXiv 2024
Retrieval Augmented Instruction Tuning for Open NER with Large Language Models
arXiv 2024
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
arXiv 2024
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
arXiv 2024
SysBench: Can Large Language Models Follow System Messages?
arXiv 2024
Ladder: A Model-Agnostic Framework Boosting LLM-based Machine Translation to the Next Level
arXiv 2024
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts
arXiv 2024
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
CVPR 2024 1
UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment
arXiv 2024
Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues
arXiv 2024
M-MAD: Multidimensional Multi-Agent Debate Framework for Fine-grained Machine Translation Evaluation
arXiv 2024
MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark
arXiv 2024
DCR: Divide-and-Conquer Reasoning for Multi-choice Question Answering with LLMs
arXiv 2024
Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views
ICCV 2023 1
Is ChatGPT a Good Recommender? A Preliminary Study
arXiv 2023
Object-centric architectures enable efficient causal representation learning
arXiv 2023
MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples
arXiv 2023
Allies: Prompting Large Language Model with Beam Search
arXiv 2023
Improving Large Language Models in Event Relation Logical Prediction
arXiv 2023
Unlocking Slot Attention by Changing Optimal Transport Costs
arXiv 2023
A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems
arXiv 2023
T5-SR: A Unified Seq-to-Seq Decoding Strategy for Semantic Parsing
arXiv 2023
IAM: A Comprehensive and Large-Scale Dataset for Integrated Argument Mining Tasks
ACL 2022 5
Pay More Attention to History: A Context Modelling Strategy for Conversational Text-to-SQL
arXiv 2021
ENT-DESC: Entity Description Generation by Exploring Knowledge Graph
EMNLP 2020 11
Affiliations
Frequent co-authors
10from 39 papers