LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Topics

1

Abstract

The prevailing Next-Token Prediction (NTP) paradigm has driven the success of large language models through discrete autoregressive modeling. However, contemporary multimodal systems remain language-centric, often treating non-linguistic modalities as external attachments, leading to fragmented architectures and suboptimal integration. To transcend this limitation, we introduce Discrete Native Autoregressive (DiNA), a unified framework that represents multimodal information within a shared discrete space, enabling a consistent and principled autoregressive modeling across modalities. A key innovation is the Discrete Native Any-resolution Visual Transformer (dNaViT), which performs tokenization and de-tokenization at arbitrary resolutions, transforming continuous visual signals into hierarchical discrete tokens. Building on this foundation, we develop LongCat-Next, a native multimodal model that processes text, vision, and audio under a single autoregressive objective with minimal modality-specific design. As an industrial-strength foundation model, it excels at seeing, painting, and talking within a single framework, achieving strong performance across a wide range of multimodal benchmarks. In particular, LongCat-Next addresses the long-standing performance ceiling of discrete vision modeling on understanding tasks and provides a unified approach to effectively reconcile the conflict between understanding and generation. As an attempt toward native multimodality, we open-source the LongCat-Next and its tokenizers, hoping to foster further research and development in the community. GitHub: https://github.com/meituan-longcat/LongCat-Next

Authors

89

Chi Zhang Jing Li Haozhe Wang Yulei Qian Yuchen Xie Siyu Ren Jiamu Li Fengjiao Chen Ziwen Wang Xuezhi Cao Xunliang Cai Taofeng Xue Chong Peng Mianqiu Huang Linsen Guo Peng Pei Jiawei Wang Wei Wang Hao Yang Jie Yang Xiaoyang Li Yifan Lu Hang Yu Quan Chen Haozhe Zhao Manyuan Zhang Yan Bai Xiaoyu Li Bin Xiao Xing Hu Xiao Liu Haoze Sun Qi Li Chen Chen Xu Huang Xuanyu Zhu Yitian Chen Xinyang Lin Jiale Hong Yufei Gao Chao Wang Zijian Zhang Hongyu Li Lin Qiu Qian Wang Jiaxing Liu Jun Kuang Xi Chen Hong Liu Ge Yang Kunming Luo Hui Su Dian Zheng Zhihang Yu Yizhen Jiang Yuqi Peng YanJie Li Yan Feng Zhenlong Yuan Meituan LongCat Team Haonan Yan Kefeng Zhang Rumei Li Yaoming Zhu Yerui Sun Chengjiang Li JiaQi Zhang Minhao Jing Tongxin Pan Xiaotong Li Xiaoyu Zhao Yao Qiu Ying Luo Yipeng Mei Yufang Liu Yufei Chen Zhixiong Han Changran Wang Haowei Guo Huicheng Jiang Jialv Zou Jianping Lin Jing Jin Juncheng She Kuofeng Gao Wenlong He Yifei Cao Yimeng Jia Zeyang Hu