Jie zhou
- Papers
- 126
Cite
Notes
Only stored in your browser.
Authored papers
126MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction
arXiv 2026
SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering
arXiv 2026
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure
arXiv 2026
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters
arXiv 2026
UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models
arXiv 2026
Omni-DuplexEval: Evaluating Real-time Duplex Omni-modal Interaction
arXiv 2026
Data Science and Technology Towards AGI Part I: Tiered Data Management
arXiv 2026
POINTS-GUI-G: GUI-Grounding Journey
arXiv 2026
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
arXiv 2026
UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection
arXiv 2026
How Far Ahead Do LLMs Plan? Uncovering the Latent Horizon in Chain-of-Thought Reasoning
arXiv 2026
FGTBT: Frequency-Guided Task-Balancing Transformer for Unified Facial Landmark Detection
arXiv 2026
STEP3-VL-10B Technical Report
arXiv 2026
HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling
arXiv 2025
D^3QE: Learning Discrete Distribution Discrepancy-aware Quantization Error for Autoregressive-Generated Image Detection
arXiv 2025
Continuous Autoregressive Language Models
arXiv 2025
MiniCPM4: Ultra-Efficient LLMs on End Devices
arXiv 2025
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
CVPR 2025 1
Image Recognition with Online Lightweight Vision Transformer: A Survey
arXiv 2025
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning
arXiv 2025
Continuous Visual Autoregressive Generation via Score Maximization
arXiv 2025
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
arXiv 2025
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
arXiv 2025
ExTrans: Multilingual Deep Reasoning Translation via Exemplar-Enhanced Reinforcement Learning
arXiv 2025
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
arXiv 2025
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
arXiv 2025
Skyra: AI-Generated Video Detection via Grounded Artifact Reasoning
arXiv 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
arXiv 2025
SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder
arXiv 2025
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
arXiv 2025
UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings
arXiv 2025
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking
arXiv 2025
Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction
arXiv 2025
Streaming 4D Visual Geometry Transformer
arXiv 2025
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
arXiv 2025
Latent Diffusion Model without Variational Autoencoder
arXiv 2025
Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence
arXiv 2025
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
unipre3d-unified-pre-training-of-3d-point
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
arXiv 2025
OGGSplat: Open Gaussian Growing for Generalizable Reconstruction with Expanded Field-of-View
arXiv 2025
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
arXiv 2025
Examining the Source of Defects from a Mechanical Perspective for 3D Anomaly Detection
arXiv 2025
The Essence of Contextual Understanding in Theory of Mind: A Study on Question Answering with Story Characters
arXiv 2025
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
arXiv 2025
Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings
arXiv 2025
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers
arXiv 2025
NUDT4MSTAR: A Large Dataset and Benchmark Towards Remote Sensing Object Recognition in the Wild
arXiv 2025
RewardAnything: Generalizable Principle-Following Reward Models
arXiv 2025
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
arXiv 2025
Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents
arXiv 2024
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
CVPR 2024 1
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
arXiv 2024
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
arXiv 2024
Q-VLM: Post-training Quantization for Large Vision-Language Models
arXiv 2024
From Individual to Society: A Survey on Social Simulation Driven by Large Language Model-based Agents
arXiv 2024
POINTS1.5: Building a Vision-Language Model towards Real World Applications
arXiv 2024
LCS: A Language Converter Strategy for Zero-Shot Neural Machine Translation
arXiv 2024
Comments as Natural Logic Pivots: Improve Code Generation via Comment Perspective
arXiv 2024
Path Choice Matters for Clear Attribution in Path Methods
arXiv 2024
PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
arXiv 2024
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
arXiv 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
arXiv 2024
Preventing Local Pitfalls in Vector Quantization via Optimal Transport
arXiv 2024
GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
CVPR 2025 1
On Prompt-Driven Safeguarding for Large Language Models
arXiv 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
arXiv 2024
FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner
arXiv 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
arXiv 2024
EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding
ICCV 2025
A Survey on the Honesty of Large Language Models
arXiv 2024
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents
arXiv 2024
DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
arXiv 2024
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
arXiv 2024
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
CVPR 2024 1
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
arXiv 2024
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity
arXiv 2024
Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning
arXiv 2024
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment
arXiv 2024
Accelerating Inference in Large Language Models with a Unified Layer Skipping Strategy
arXiv 2024
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
arXiv 2023
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
arXiv 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
NeurIPS 2023 11
Unleashing Text-to-Image Diffusion Models for Visual Perception
unleashing-text-to-image-diffusion-models-for
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
CVPR 2023 1
SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
ICCV 2023 1
MELO: Enhancing Model Editing with Neuron-Indexed Dynamic LoRA
arXiv 2023
TCOVIS: Temporally Consistent Online Video Instance Segmentation
ICCV 2023 1
Distilling Rule-based Knowledge into Large Language Models
arXiv 2023
Towards Multiple References Era -- Addressing Data Leakage and Limited Reference Diversity in NLG Evaluation
arXiv 2023
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education
arXiv 2023
UniPC: A Unified Predictor-Corrector Framework for Fast Sampling of Diffusion Models
unipc-a-unified-predictor-corrector-framework
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
arXiv 2023
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory
mcuformer-deploying-vision-tranformers-on
Plug-and-Play Knowledge Injection for Pre-trained Language Models
arXiv 2023
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
ICCV 2023 1
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
arXiv 2023
Large Language Models Are Not Robust Multiple Choice Selectors
arXiv 2023
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs
arXiv 2023
Efficient Meshy Neural Fields for Animatable Human Avatars
arXiv 2023
MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation
arXiv 2023
Improving Translation Faithfulness of Large Language Models via Augmenting Instructions
arXiv 2023
HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
arXiv 2022
Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind
arXiv 2022
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving
arXiv 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
CVPR 2023 1
Diffusion-SDF: Text-to-Shape via Voxelized Diffusion
CVPR 2023 1
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
arXiv 2022
Demystify Transformers & Convolutions in Modern Image Deep Networks
arXiv 2022
Mixture of Attention Heads: Selecting Attention Heads Per Token
arXiv 2022
SemAffiNet: Semantic-Affine Transformation for Point Cloud Segmentation
CVPR 2022 1
ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization
arXiv 2022
OPERA: Omni-Supervised Representation Learning with Hierarchical Supervisions
ICCV 2023 1
Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting
NAACL 2022 7
Token-Label Alignment for Vision Transformers
ICCV 2023 1
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning
arXiv 2022
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
CVPR 2022 1
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
CVPR 2022 1
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management
arXiv 2021
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances
ACL 2021 5
Fully Hyperbolic Neural Networks
ACL 2022 5
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Findings (ACL) 2022 5
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models
EMNLP 2021 11
CokeBERT: Contextual Knowledge Selection and Embedding towards Enhanced Pre-Trained Language Models
arXiv 2020
DocRED: A Large-Scale Document-Level Relation Extraction Dataset
docred-a-large-scale-document-level-relation-1
FewRel 2.0: Towards More Challenging Few-Shot Relation Classification
fewrel-20-towards-more-challenging-few-shot-1
An Improved Evaluation Framework for Generative Adversarial Networks
arXiv 2018
Affiliations
Frequent co-authors
10from 126 papers