Qi Liu
- Papers
- 48
Cite
Notes
Only stored in your browser.
Authored papers
48TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
arXiv 2026
Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
arXiv 2026
Agent-R1: A Unified and Modular Framework for Agentic Reinforcement Learning
arXiv 2025
Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting
arXiv 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
arXiv 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
arXiv 2025
CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing
arXiv 2025
A Survey on Knowledge-Oriented Retrieval-Augmented Generation
arXiv 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
arXiv 2025
GraphPrompter: Multi-stage Adaptive Prompt Optimization for Graph In-Context Learning
arXiv 2025
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
arXiv 2025
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
arXiv 2025
E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker
arXiv 2025
Visual Autoregressive Modeling for Instruction-Guided Image Editing
arXiv 2025
Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
arXiv 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
arXiv 2025
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
arXiv 2025
TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving
arXiv 2025
MindBridge: Scalable and Cross-Model Knowledge Editing via Memory-Augmented Modality
arXiv 2025
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective
arXiv 2025
TEMPLE:Temporal Preference Learning of Video LLMs via Difficulty Scheduling and Pre-SFT Alignment
arXiv 2025
FullStack Bench: Evaluating LLMs as Full Stack Coders
arXiv 2024
Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models
arXiv 2024
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
arXiv 2024
ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
arXiv 2024
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling
arXiv 2024
GIRAFFE: Design Choices for Extending the Context Length of Visual Language Models
arXiv 2024
A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment
arXiv 2024
Evaluation of Retrieval-Augmented Generation: A Survey
arXiv 2024
TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
arXiv 2024
CursorCore: Assist Programming through Aligning Anything
arXiv 2024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
arXiv 2024
TerDiT: Ternary Diffusion Models with Transformers
arXiv 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
arXiv 2024
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models
arXiv 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
arXiv 2024
Jailbreaking as a Reward Misspecification Problem
arXiv 2024
ImgTrojan: Jailbreaking Vision-Language Models with ONE Image
arXiv 2024
Temporal Reasoning Transfer from Text to Video
arXiv 2024
Towards Faithful Explanations: Boosting Rationalization with Shortcuts Discovery
arXiv 2024
A Survey of Reasoning with Foundation Models
arXiv 2023
Large Language Models are not Fair Evaluators
arXiv 2023
A Survey on Large Language Models for Recommendation
arXiv 2023
BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT
arXiv 2023
Learning Subpocket Prototypes for Generalizable Structure-based Drug Design
arXiv 2023
FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models
arXiv 2023
Can Language Models Understand Physical Concepts?
arXiv 2023
TTIDA: Controllable Generative Data Augmentation via Text-to-Text and Text-to-Image Models
arXiv 2023
Affiliations
Frequent co-authors
10from 48 papers