Jing Zhang
- Papers
- 62
Cite
Notes
Only stored in your browser.
Authored papers
62All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models
arXiv 2026
XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression
arXiv 2026
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
arXiv 2025
OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale
arXiv 2025
Cosmos World Foundation Model Platform for Physical AI
arXiv 2025
Quadratic Interest Network for Multimodal Click-Through Rate Prediction
arXiv 2025
Dynamic Scaling of Unit Tests for Code Reward Modeling
arXiv 2025
GP-GS: Gaussian Processes for Enhanced Gaussian Splatting
arXiv 2025
CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis
arXiv 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
CVPR 2025 1
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition
arXiv 2025
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
arXiv 2025
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
arXiv 2025
AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology
arXiv 2025
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL
arXiv 2025
QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition
arXiv 2025
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
CVPR 2025 1
Nemotron-4 340B Technical Report
arXiv 2024
Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset
ICCV 2025
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
arXiv 2024
CodeS: Towards Building Open-source Language Models for Text-to-SQL
arXiv 2024
SAM Decoding: Speculative Decoding via Suffix Automaton
arXiv 2024
VectorPainter: Advanced Stylized Vector Graphics Synthesis Using Stroke-Style Priors
arXiv 2024
Training A Small Emotional Vision Language Model for Visual Art Comprehension
arXiv 2024
TAVGBench: Benchmarking Text to Audible-Video Generation
arXiv 2024
SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation
arXiv 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
arXiv 2024
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
arXiv 2024
Streamlining Redundant Layers to Compress Large Language Models
arXiv 2024
A Solution-based LLM API-using Methodology for Academic Information Seeking
arXiv 2024
DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models
arXiv 2024
Deep Learning for Camera Calibration and Beyond: A Survey
arXiv 2023
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
arXiv 2023
HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting
arXiv 2023
SVGDreamer: Text Guided SVG Generation with Diffusion Model
CVPR 2024 1
DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models
diffsketcher-text-guided-vector-sketch
P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
ICCV 2023 1
The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT
arXiv 2023
RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation
ICCV 2023 1
GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation
arXiv 2023
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
ICCV 2023 1
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond
arXiv 2023
MPMQA: Multimodal Question Answering on Product Manuals
arXiv 2023
Model Calibration in Dense Classification with Adaptive Label Perturbation
ICCV 2023 1
AlignBench: Benchmarking Chinese Alignment of Large Language Models
arXiv 2023
Audio-Visual Segmentation with Semantics
arXiv 2023
RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
arXiv 2023
Vision Transformer with Quadrangle Attention
arXiv 2023
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
ICCV 2023 1
IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models
arXiv 2023
Unifying Flow, Stereo and Depth Estimation
arXiv 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
arXiv 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
arXiv 2022
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
CVPR 2023 1
VSA: Learning Varied-Size Window Attention in Vision Transformers
arXiv 2022
CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose
CVPR 2023 1
ReAct: Temporal Action Detection with Relational Queries
arXiv 2022
From heavy rain removal to detail restoration: A faster and better network
arXiv 2022
GMFlow: Learning Optical Flow via Global Matching
CVPR 2022 1
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
arXiv 2021
One-Shot Object Affordance Detection in the Wild
arXiv 2021
GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training
arXiv 2020
Affiliations
Frequent co-authors
10from 62 papers