Yuliang Liu
- Papers
- 26
Cite
Notes
Only stored in your browser.
Authored papers
26Multimodal OCR: Parse Anything from Documents
arXiv 2026
MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios
arXiv 2026
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously
arXiv 2026
TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering
arXiv 2026
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
arXiv 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
arXiv 2025
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
semiets-integrating-spatial-and-content
TokBench: Evaluating Your Visual Tokenizer before Visual Generation
arXiv 2025
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
ICCV 2025
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
arXiv 2025
Liquid: Language Models are Scalable Multi-modal Generators
arXiv 2024
R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models
arXiv 2024
PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling
arXiv 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
arXiv 2024
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
arXiv 2024
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
arXiv 2024
Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid
arXiv 2024
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
arXiv 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
arXiv 2024
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
arXiv 2023
Toward Real Text Manipulation Detection: New Dataset and New Solution
arXiv 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
arXiv 2023
ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer
ICCV 2023 1
MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification
arXiv 2022
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training
arXiv 2021
ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network
abcnet-real-time-scene-text-spotting-with-1
Affiliations
Frequent co-authors
10from 26 papers