Jianfeng Gao
- Papers
- 92
Cite
Notes
Only stored in your browser.
Authored papers
92Orchard: An Open-Source Agentic Modeling Framework
arXiv 2026
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery
arXiv 2026
SEMA: Simple yet Effective Learning for Multi-Turn Jailbreak Attacks
arXiv 2026
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
arXiv 2026
Magma: A Foundation Model for Multimodal AI Agents
CVPR 2025 1
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
arXiv 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
arXiv 2025
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
arXiv 2025
Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning
arXiv 2025
Training Language Models to Generate Quality Code with Program Analysis Feedback
arXiv 2025
Text Generation Beyond Discrete Token Sampling
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
EfficientLLM: Efficiency in Large Language Models
arXiv 2025
FlowRL: Matching Reward Distributions for LLM Reasoning
arXiv 2025
Adapting Web Agents with Synthetic Supervision
arXiv 2025
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
arXiv 2025
Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way
arXiv 2024
TrustLLM: Trustworthiness in Large Language Models
arXiv 2024
Vector-ICL: In-context Learning with Continuous Vector Representations
arXiv 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
CVPR 2025 1
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
arXiv 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
arXiv 2024
GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding
arXiv 2024
Rethinking Interpretability in the Era of Large Language Models
arXiv 2024
Matryoshka Multimodal Models
arXiv 2024
Agent AI: Surveying the Horizons of Multimodal Interaction
arXiv 2024
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
arXiv 2024
Pix2Gif: Motion-Guided Diffusion for GIF Generation
arXiv 2024
Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities
arXiv 2024
Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
arXiv 2024
SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading
arXiv 2024
Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
arXiv 2024
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
arXiv 2024
Learning a Decision Tree Algorithm with Transformers
arXiv 2024
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models
arXiv 2024
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models
arXiv 2024
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
arXiv 2023
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
arXiv 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
CVPR 2023 1
Explaining black box text modules in natural language with language models
arXiv 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
arXiv 2023
A Simple Framework for Open-Vocabulary Segmentation and Detection
ICCV 2023 1
Instruction Tuning with GPT-4
arXiv 2023
Semantic-SAM: Segment and Recognize Anything at Any Granularity
arXiv 2023
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
arXiv 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
arXiv 2023
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
arXiv 2023
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
arXiv 2023
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
arXiv 2023
Tree Prompting: Efficient Task Adaptation without Fine-Tuning
arXiv 2023
Differentiable Tree Operations Promote Compositional Generalization
arXiv 2023
Teaching Language Models to Self-Improve through Interactive Demonstrations
arXiv 2023
Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models
arXiv 2023
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
arXiv 2023
Visual In-Context Prompting
CVPR 2024 1
Augmenting Language Models with Long-Term Memory
augmenting-language-models-with-long-term
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
arXiv 2023
Guiding Large Language Models via Directional Stimulus Prompting
guiding-large-language-models-via-directional
Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers
arXiv 2023
Is Self-Repair a Silver Bullet for Code Generation?
arXiv 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
localized-symbolic-knowledge-distillation-for
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
arXiv 2022
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization
arXiv 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
arXiv 2022
Focal Modulation Networks
arXiv 2022
GODEL: Large-Scale Pre-Training for Goal-Directed Dialog
arXiv 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
arXiv 2022
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format
arXiv 2022
Fault-Aware Neural Code Rankers
arXiv 2022
Generalized Decoding for Pixel, Image, and Language
CVPR 2023 1
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
coarse-to-fine-vision-language-pre-training-1
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation
arXiv 2022
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions
arXiv 2022
Language Models as Inductive Reasoners
arXiv 2022
RegionCLIP: Region-based Language-Image Pretraining
CVPR 2022 1
RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling
arXiv 2021
VinVL: Revisiting Visual Representations in Vision-Language Models
CVPR 2021 1
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
arXiv 2021
LiST: Lite Prompted Self-training Makes Parameter-Efficient Few-shot Learners
list-lite-self-training-makes-efficient-few
Florence: A New Foundation Model for Computer Vision
arXiv 2021
Image Scene Graph Generation (SGG) Benchmark
arXiv 2021
Taming Sparsely Activated Transformer with Stochastic Experts
taming-sparsely-activated-transformer-with-1
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
ECCV 2020 8
Few-shot Natural Language Generation for Task-Oriented Dialog
Findings of the Association for Computational Linguistics 2020
Generation-Augmented Retrieval for Open-domain Question Answering
ACL 2021 5
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
arXiv 2019
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
smart-robust-and-efficient-fine-tuning-for-1
Unified Vision-Language Pre-Training for Image Captioning and VQA
arXiv 2019
On the Variance of the Adaptive Learning Rate and Beyond
ICLR 2020 1
Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning
discriminative-deep-dyna-q-robust-planning-1
Bi-directional Attention with Agreement for Dependency Parsing
bi-directional-attention-with-agreement-for-1
MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition
arXiv 2016
Affiliations
Frequent co-authors
10from 92 papers