0

Yuliang Liu

Papers
26

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
26papers

Authored papers

26

Multimodal OCR: Parse Anything from Documents

arXiv 2026

2026

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

arXiv 2026

2026

Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously

arXiv 2026

2026

TextPecker: Rewarding Structural Anomaly Quantification for Enhancing Visual Text Rendering

arXiv 2026

2026

MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm

arXiv 2025

2025

OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models

arXiv 2025

2025

SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting

semiets-integrating-spatial-and-content

2025

TokBench: Evaluating Your Visual Tokenizer before Visual Generation

arXiv 2025

2025

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

ICCV 2025

2025

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

arXiv 2025

2025

Liquid: Language Models are Scalable Multi-modal Generators

arXiv 2024

2024

R-CoT: Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models

arXiv 2024

2024

PDF-WuKong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling

arXiv 2024

2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

arXiv 2024

2024

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

arXiv 2024

2024

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

arXiv 2024

2024

Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid

arXiv 2024

2024

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

arXiv 2024

2024

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

arXiv 2024

2024

ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code

arXiv 2023

2023

Toward Real Text Manipulation Detection: New Dataset and New Solution

arXiv 2023

2023

Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

arXiv 2023

2023

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

ICCV 2023 1

2023

MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification

arXiv 2022

2022

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

arXiv 2021

2021

ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

abcnet-real-time-scene-text-spotting-with-1

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 26 papers