Wei Ji
- Papers
- 20
Cite
Notes
Only stored in your browser.
Authored papers
20Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
UniM: A Unified Any-to-Any Interleaved Multimodal Benchmark
arXiv 2026
Interp3D: Correspondence-aware Interpolation for Generative Textured 3D Morphing
arXiv 2026
STEP3-VL-10B Technical Report
arXiv 2026
Step-DeepResearch Technical Report
arXiv 2025
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
arXiv 2025
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model
arXiv 2025
Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model
arXiv 2025
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
arXiv 2025
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale
arXiv 2025
Step-Audio 2 Technical Report
arXiv 2025
NExT-GPT: Any-to-Any Multimodal LLM
arXiv 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
NeurIPS 2023 11
Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World
ICCV 2023 1
Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation
arXiv 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
arXiv 2023
Generating Visual Spatial Description via Holistic 3D Scene Understanding
arXiv 2023
Towards Robust Multi-Modal Reasoning via Model Selection
arXiv 2023
NExT-Chat: An LMM for Chat, Detection and Segmentation
arXiv 2023
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
arXiv 2022
Affiliations
Frequent co-authors
10from 20 papers