Yang Li
- Papers
- 64
Cite
Notes
Only stored in your browser.
Authored papers
64EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation
arXiv 2026
InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
arXiv 2026
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
HY3D-Bench: Generation of 3D Assets
arXiv 2026
PhysForge: Generating Physics-Grounded 3D Assets for Interactive Virtual World
arXiv 2026
A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping
arXiv 2026
AT^2PO: Agentic Turn-based Policy Optimization via Tree Search
arXiv 2026
BMAM: Brain-inspired Multi-Agent Memory Framework
arXiv 2026
Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement
arXiv 2026
LM-Lexicon: Improving Definition Modeling via Harmonizing Semantic Experts
arXiv 2026
STEP3-VL-10B Technical Report
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
SkyReels-V2: Infinite-length Film Generative Model
arXiv 2025
Kimi-VL Technical Report
arXiv 2025
UXAgent: An LLM Agent-Based Usability Testing Framework for Web Design
arXiv 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
arXiv 2025
Hierarchical Feature Learning for Medical Point Clouds via State Space Model
arXiv 2025
UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement
arXiv 2025
In Pursuit of Pixel Supervision for Visual Pre-training
arXiv 2025
HunyuanVideo 1.5 Technical Report
arXiv 2025
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
arXiv 2025
Part-X-MLLM: Part-aware 3D Multimodal Large Language Model
arXiv 2025
JoyAgent-JDGenie: Technical Report on the GAIA
arXiv 2025
X-Part: high fidelity and structure coherent shape decomposition
arXiv 2025
P3-SAM: Native 3D Part Segmentation
arXiv 2025
HunyuanImage 3.0 Technical Report
arXiv 2025
Meta CLIP 2: A Worldwide Scaling Recipe
arXiv 2025
Ovis2.5 Technical Report
arXiv 2025
Ovis-U1 Technical Report
arXiv 2025
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
arXiv 2025
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities
arXiv 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
arXiv 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
arXiv 2025
DenseLoRA: Dense Low-Rank Adaptation of Large Language Models
arXiv 2025
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs
arXiv 2025
SkyReels-A2: Compose Anything in Video Diffusion Transformers
arXiv 2025
Population Aware Diffusion for Time Series Generation
arXiv 2025
Rethinking Video Tokenization: A Conditioned Diffusion-based Approach
arXiv 2025
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
arXiv 2024
HunyuanVideo: A Systematic Framework For Large Video Generative Models
arXiv 2024
SEED-Story: Multimodal Long Story Generation with Large Language Model
arXiv 2024
Parrot: Multilingual Visual Instruction Tuning
arXiv 2024
CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization
arXiv 2024
FIND: Fine-tuning Initial Noise Distribution with Policy Optimization for Diffusion Models
arXiv 2024
Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts
arXiv 2024
P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders
arXiv 2024
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
arXiv 2024
BlockFusion: Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
arXiv 2024
SCNet: Sparse Compression Network for Music Source Separation
arXiv 2024
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
arXiv 2024
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
arXiv 2024
Can LLMs be Good Graph Judge for Knowledge Graph Construction?
arXiv 2024
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design
arXiv 2023
OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping
openlane-v2-a-topology-reasoning-benchmark
Graph-based Topology Reasoning for Driving Scenes
arXiv 2023
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension Tasks
arXiv 2023
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X
arXiv 2023
An Efficient Knowledge Transfer Strategy for Spiking Neural Networks from Static to Event Domain
arXiv 2023
Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection
arXiv 2023
FormalGeo: An Extensible Formalized Framework for Olympiad Geometric Problem Solving
arXiv 2023
PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark
persformer-3d-lane-detection-via-perspective
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
arXiv 2021
Facilitating Database Tuning with Hyper-Parameter Optimization: A Comprehensive Experimental Evaluation
arXiv 2021
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements
EMNLP 2020 11
Affiliations
Frequent co-authors
10from 64 papers