Zheng Zhang
- Papers
- 66
Cite
Notes
Only stored in your browser.
Authored papers
66Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model
arXiv 2026
Large Language Models Explore by Latent Distilling
arXiv 2026
UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass
arXiv 2026
Kimi K2.5: Visual Agentic Intelligence
arXiv 2026
Muon is Scalable for LLM Training
arXiv 2025
MAGI-1: Autoregressive Video Generation at Scale
arXiv 2025
Kimi-VL Technical Report
arXiv 2025
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
arXiv 2025
An Empirical Study on Prompt Compression for Large Language Models
arXiv 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
arXiv 2025
CTR-Driven Advertising Image Generation with Multimodal Large Language Models
arXiv 2025
Investigating Hallucination in Conversations for Low Resource Languages
arXiv 2025
Kimi Linear: An Expressive, Efficient Attention Architecture
arXiv 2025
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
arXiv 2025
PsyLite Technical Report
arXiv 2025
MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind
arXiv 2025
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
arXiv 2024
Xwin-LM: Strong and Scalable Alignment Practice for LLMs
arXiv 2024
Hallucination of Multimodal Large Language Models: A Survey
arXiv 2024
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
arXiv 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
arXiv 2024
TEG-DB: A Comprehensive Dataset and Benchmark of Textual-Edge Graphs
arXiv 2024
LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models
arXiv 2024
Can Language Models Learn to Skip Steps?
arXiv 2024
AdaZeta: Adaptive Zeroth-Order Tensor-Train Adaption for Memory-Efficient Large Language Models Fine-Tuning
arXiv 2024
Contextual Interaction via Primitive-based Adversarial Training For Compositional Zero-shot Learning
arXiv 2024
Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets
arXiv 2024
OffensiveLang: A Community Based Implicit Offensive Language Dataset
arXiv 2024
Unified Lexical Representation for Interpretable Visual-Language Alignment
arXiv 2024
ECon: On the Detection and Resolution of Evidence Conflicts
arXiv 2024
Investigating Annotator Bias in Large Language Models for Hate Speech Detection
arXiv 2024
GRAG: Graph Retrieval-Augmented Generation
arXiv 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
arXiv 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
arXiv 2024
DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training
arXiv 2023
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning
arXiv 2023
Coarse-to-Fine Amodal Segmentation with Shape Prior
ICCV 2023 1
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
arXiv 2023
Segment and Caption Anything
CVPR 2024 1
DETR Doesn't Need Multi-Scale or Locality Design
arXiv 2023
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical Understanding
arXiv 2023
Rethinking Amodal Video Segmentation from Learning Supervised Signals with Object-centric Representation
ICCV 2023 1
Side Adapter Network for Open-Vocabulary Semantic Segmentation
CVPR 2023 1
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
ICCV 2023 1
Relation-Aware Diffusion Model for Controllable Poster Layout Generation
arXiv 2023
Object-Centric Multiple Object Tracking
ICCV 2023 1
Masked Structural Growth for 2x Faster Language Model Pre-training
arXiv 2023
Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations
arXiv 2023
Unsupervised Open-Vocabulary Object Localization in Videos
ICCV 2023 1
BiPFT: Binary Pre-trained Foundation Transformer with Low-rank Estimation of Binarization Residual Polynomials
arXiv 2023
EVA2.0: Investigating Open-Domain Chinese Dialogue Systems with Large-Scale Pre-Training
arXiv 2022
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format
arXiv 2022
Exploring Discrete Diffusion Models for Image Captioning
arXiv 2022
Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension
arXiv 2022
Vega-MT: The JD Explore Academy Translation System for WMT22
arXiv 2022
CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation
arXiv 2022
AugESC: Dialogue Augmentation with Large Language Models for Emotional Support Conversation
arXiv 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
arXiv 2022
SimMIM: A Simple Framework for Masked Image Modeling
CVPR 2022 1
Video Swin Transformer
CVPR 2022 1
End-to-End Semi-Supervised Object Detection with Soft Teacher
ICCV 2021 10
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Findings (EMNLP) 2021 11
Prototype-supervised Adversarial Network for Targeted Attack of Deep Hashing
CVPR 2021 1
Self-Supervised Learning with Swin Transformers
arXiv 2021
Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning
CVPR 2021 1
CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
crosswoz-a-large-scale-chinese-cross-domain-1
Affiliations
Frequent co-authors
10from 66 papers