0

Chao Zhang

Papers
68

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
68papers

Authored papers

68

OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond

arXiv 2026

2026

HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds

arXiv 2026

2026

LongCat-Flash-Thinking-2601 Technical Report

arXiv 2026

2026

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

arXiv 2026

2026

LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning

arXiv 2026

2026

OpenDecoder: Open Large Language Model Decoding to Incorporate Document Quality in RAG

arXiv 2026

2026

FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios

arXiv 2026

2026

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

arXiv 2026

2026

VID-AD: A Dataset for Image-Level Logical Anomaly Detection under Vision-Induced Distraction

arXiv 2026

2026

Revisiting the Reliability of Language Models in Instruction-Following

arXiv 2025

2026

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels

arXiv 2025

2025

Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation

arXiv 2025

2025

Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

arXiv 2025

2025

YuE: Scaling Open Foundation Models for Long-Form Music Generation

arXiv 2025

2025

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

arXiv 2025

2025

Adaptation of Agentic AI

arXiv 2025

2025

video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models

arXiv 2025

2025

MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering

arXiv 2025

2025

HunyuanImage 3.0 Technical Report

arXiv 2025

2025

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

arXiv 2025

2025

Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models

arXiv 2025

2025

GraphOmni: A Comprehensive and Extendable Benchmark Framework for Large Language Models on Graph-theoretic Tasks

arXiv 2025

2025

video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model

arXiv 2025

2025

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

mmgdreamer-mixed-modality-graph-for-geometry

2025

Language Model Uncertainty Quantification with Attention Chain

arXiv 2025

2025

Tady: A Neural Disassembler without Structural Constraint Violations

arXiv 2025

2025

Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

arXiv 2025

2025

ACVUBench: Audio-Centric Video Understanding Benchmark

arXiv 2025

2025

Streamlining the Collaborative Chain of Models into A Single Forward Pass in Generation-Based Tasks

arXiv 2025

2025

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

arXiv 2025

2025

Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation

arXiv 2025

2025

TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework

arXiv 2025

2025

Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin

ICCV 2025

2025

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

arXiv 2024

2024

Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis

arXiv 2024

2024

NoteLLM-2: Multimodal Large Representation Models for Recommendation

arXiv 2024

2024

APISR: Anime Production Inspired Real-World Anime Super-Resolution

CVPR 2024 1

2024

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

arXiv 2024

2024

3DMIT: 3D Multi-modal Instruction Tuning for Scene Understanding

arXiv 2024

2024

CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision

arXiv 2024

2024

Efficient Evolutionary Search Over Chemical Space with Large Language Models

arXiv 2024

2024

Aligning Large Language Models with Representation Editing: A Control Perspective

arXiv 2024

2024

Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

arXiv 2024

2024

Matryoshka: Learning to Drive Black-Box LLMs with LLMs

arXiv 2024

2024

Semantic Map-based Generation of Navigation Instructions

arXiv 2024

2024

Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs

arXiv 2024

2024

An Engorgio Prompt Makes Large Language Model Babble on

arXiv 2024

2024

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

arXiv 2024

2024

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

arXiv 2024

2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models

arXiv 2023

2023

ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval

arXiv 2023

2023

PolyIE: A Dataset of Information Extraction from Polymer Material Scientific Literature

arXiv 2023

2023

Large Language Models for Generative Information Extraction: A Survey

arXiv 2023

2023

ToolQA: A Dataset for LLM Question Answering with External Tools

toolqa-a-dataset-for-llm-question-answering

2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

arXiv 2023

2023

C3: Zero-shot Text-to-SQL with ChatGPT

arXiv 2023

2023

Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias

large-language-model-as-attributed-training

2023

AdaPlanner: Adaptive Planning from Feedback with Language Models

adaplanner-adaptive-planning-from-feedback

2023

Rank-DETR for High Quality Object Detection

rank-detr-for-high-quality-object-detection

2023

One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training

ICCV 2023 1

2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

arXiv 2023

2023

COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning

arXiv 2022

2022

A general-purpose material property data extraction pipeline from large polymer corpora using Natural Language Processing

arXiv 2022

2022

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

arXiv 2022

2022

Model-Aware Contrastive Learning: Towards Escaping the Dilemmas

arXiv 2022

2022

DETRs with Hybrid Matching

CVPR 2023 1

2022

A Survey on Programmatic Weak Supervision

arXiv 2022

2022

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach

NAACL 2021 4

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 68 papers