Ke Li

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

arXiv 2025

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

arXiv 2025

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization

arXiv 2025

Training-Free Group Relative Policy Optimization

arXiv 2025

Radiance Fields in XR: A Survey on How Radiance Fields are Envisioned and Addressed for XR Research

arXiv 2025

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

arXiv 2025

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

solving-the-catastrophic-forgetting-problem

SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents

arXiv 2025

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

arXiv 2025

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray

arXiv 2025

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

arXiv 2025

Bridging Sequence-Structure Alignment in RNA Foundation Models

arXiv 2024

Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion

arXiv 2024

Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models

arXiv 2024

CoMT: Chain-of-Medical-Thought Reduces Hallucination in Medical Report Generation

arXiv 2024

Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators

arXiv 2024

Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

arXiv 2024

Sinkhorn Distance Minimization for Knowledge Distillation

arXiv 2024

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

arXiv 2024

InstOptima: Evolutionary Multi-objective Instruction Optimization via Large Language Model-based Instruction Operators

arXiv 2023

Aligning and Prompting Everything All at Once for Universal Visual Perception

arXiv 2023

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

ICCV 2023 1

Masked Autoencoders are Efficient Class Incremental Learners

ICCV 2023 1

SketchXAI: A First Look at Explainability for Human Sketches

CVPR 2023 1

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

arXiv 2023

Woodpecker: Hallucination Correction for Multimodal Large Language Models

arXiv 2023

BootAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

arXiv 2022

2022

PyABSA: A Modularized Framework for Reproducible Aspect-based Sentiment Analysis

arXiv 2022

2022

C3KG: A Chinese Commonsense Conversation Knowledge Graph

arXiv 2022

2022

speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment

arXiv 2021

LSA: Modeling Aspect Sentiment Coherency via Local Sentiment Aggregation

arXiv 2021

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting

ICCV 2021 10

Pose Recognition with Cascade Transformers

CVPR 2021 1

Gotta Go Fast When Generating Data with Score-Based Models

gotta-go-fast-when-generating-data-with-score-1

Hyperspectral Image Super-Resolution with Spectral Mixup and Heterogeneous Datasets

arXiv 2021