0

Jing Zhang

Papers
62

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
62papers

Authored papers

62

All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

arXiv 2026

2026

XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression

arXiv 2026

2026

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

arXiv 2025

2025

OmniSQL: Synthesizing High-quality Text-to-SQL Data at Scale

arXiv 2025

2025

Cosmos World Foundation Model Platform for Physical AI

arXiv 2025

2025

Quadratic Interest Network for Multimodal Click-Through Rate Prediction

arXiv 2025

2025

Dynamic Scaling of Unit Tests for Code Reward Modeling

arXiv 2025

2025

GP-GS: Gaussian Processes for Enhanced Gaussian Splatting

arXiv 2025

2025

CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis

arXiv 2025

2025

Identifying and Mitigating Position Bias of Multi-image Vision-Language Models

CVPR 2025 1

2025

Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language Recognition

arXiv 2025

2025

MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

arXiv 2025

2025

More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

arXiv 2025

2025

AnesBench: Multi-Dimensional Evaluation of LLM Reasoning in Anesthesiology

arXiv 2025

2025

Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL

arXiv 2025

2025

QUAD: Quantization and Parameter-Efficient Tuning of LLM with Activation Decomposition

arXiv 2025

2025

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

CVPR 2025 1

2024

Nemotron-4 340B Technical Report

arXiv 2024

2024

Scaling Efficient Masked Image Modeling on Large Remote Sensing Dataset

ICCV 2025

2024

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

arXiv 2024

2024

CodeS: Towards Building Open-source Language Models for Text-to-SQL

arXiv 2024

2024

SAM Decoding: Speculative Decoding via Suffix Automaton

arXiv 2024

2024

VectorPainter: Advanced Stylized Vector Graphics Synthesis Using Stroke-Style Priors

arXiv 2024

2024

Training A Small Emotional Vision Language Model for Visual Art Comprehension

arXiv 2024

2024

TAVGBench: Benchmarking Text to Audible-Video Generation

arXiv 2024

2024

SVGDreamer++: Advancing Editability and Diversity in Text-Guided SVG Generation

arXiv 2024

2024

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

arXiv 2024

2024

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

arXiv 2024

2024

Streamlining Redundant Layers to Compress Large Language Models

arXiv 2024

2024

A Solution-based LLM API-using Methodology for Academic Information Seeking

arXiv 2024

2024

DreamSteerer: Enhancing Source Image Conditioned Editability using Personalized Diffusion Models

arXiv 2024

2024

Deep Learning for Camera Calibration and Beyond: A Survey

arXiv 2023

2023

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

arXiv 2023

2023

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

arXiv 2023

2023

SVGDreamer: Text Guided SVG Generation with Diffusion Model

CVPR 2024 1

2023

DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models

diffsketcher-text-guided-vector-sketch

2023

P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds

ICCV 2023 1

2023

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

arXiv 2023

2023

RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow Estimation

ICCV 2023 1

2023

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

arXiv 2023

2023

ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution

ICCV 2023 1

2023

APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond

arXiv 2023

2023

MPMQA: Multimodal Question Answering on Product Manuals

arXiv 2023

2023

Model Calibration in Dense Classification with Adaptive Label Perturbation

ICCV 2023 1

2023

AlignBench: Benchmarking Chinese Alignment of Large Language Models

arXiv 2023

2023

Audio-Visual Segmentation with Semantics

arXiv 2023

2023

RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL

arXiv 2023

2023

Vision Transformer with Quadrangle Attention

arXiv 2023

2023

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

ICCV 2023 1

2023

IMPUS: Image Morphing with Perceptually-Uniform Sampling Using Diffusion Models

arXiv 2023

2023

Unifying Flow, Stereo and Depth Estimation

arXiv 2022

2022

Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

arXiv 2022

2022

ViTPose++: Vision Transformer for Generic Body Pose Estimation

arXiv 2022

2022

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

CVPR 2023 1

2022

VSA: Learning Varied-Size Window Attention in Vision Transformers

arXiv 2022

2022

CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose

CVPR 2023 1

2022

ReAct: Temporal Action Detection with Relational Queries

arXiv 2022

2022

From heavy rain removal to detail restoration: A faster and better network

arXiv 2022

2022

GMFlow: Learning Optical Flow via Global Matching

CVPR 2022 1

2021

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

arXiv 2021

2021

One-Shot Object Affordance Detection in the Wild

arXiv 2021

2021

GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training

arXiv 2020

2020

Affiliations

No known affiliations.

Frequent co-authors

10

from 62 papers