Kai-Wei Chang
- Papers
- 64
Cite
Notes
Only stored in your browser.
Authored papers
64LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues
arXiv 2026
OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
arXiv 2026
Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability
arXiv 2026
OpenThoughts: Data Recipes for Reasoning Models
arXiv 2025
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation
arXiv 2025
LaViDa: A Large Diffusion Language Model for Multimodal Understanding
arXiv 2025
X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents
arXiv 2025
Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions
arXiv 2025
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
arXiv 2025
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
arXiv 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
arXiv 2025
Visualized Text-to-Image Retrieval
arXiv 2025
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing
arXiv 2025
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
arXiv 2025
FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback
arXiv 2025
DialectGen: Benchmarking and Improving Dialect Robustness in Multimodal Generation
arXiv 2025
TrustLLM: Trustworthiness in Large Language Models
arXiv 2024
Matryoshka Query Transformer for Large Vision-Language Models
arXiv 2024
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
arXiv 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
arXiv 2024
On Prompt-Driven Safeguarding for Large Language Models
arXiv 2024
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
arXiv 2024
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
arXiv 2024
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
arXiv 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
arXiv 2024
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models
arXiv 2024
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
arXiv 2024
Verbalized Representation Learning for Interpretable Few-Shot Generalization
ICCV 2025
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
arXiv 2024
LLMs in Biomedicine: A study on clinical Named Entity Recognition
arXiv 2024
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation
arXiv 2024
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention
arXiv 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
arXiv 2024
Enhancing Large Vision Language Models with Self-Training on Image Comprehension
arXiv 2024
TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation
arXiv 2024
Re-ReST: Reflection-Reinforced Self-Training for Language Agents
arXiv 2024
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension
arXiv 2024
MetaKP: On-Demand Keyphrase Generation
arXiv 2024
What's "up" with vision-language models? Investigating their struggle with spatial reasoning
arXiv 2023
Agent Lumos: Unified and Modular Training for Open-Source Language Agents
arXiv 2023
IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
arXiv 2023
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks
arXiv 2023
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
arXiv 2023
KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation
arXiv 2023
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
arXiv 2023
Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation
arXiv 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
CVPR 2024 1
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
ICCV 2023 1
Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models
arXiv 2023
Red Teaming Language Model Detectors with Language Models
arXiv 2023
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
arXiv 2023
Efficient Shapley Values Estimation by Amortization for Text Classification
arXiv 2023
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems
arXiv 2023
A Survey of Deep Learning for Mathematical Reasoning
arXiv 2022
GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles
arXiv 2022
Representation Learning for Resource-Constrained Keyphrase Generation
arXiv 2022
Controllable Text Generation with Neurally-Decomposed Oracle
arXiv 2022
Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction
ACL 2022 5
How Much Can CLIP Benefit Vision-and-Language Tasks?
arXiv 2021
DEGREE: A Data-Efficient Generation-Based Event Extraction Model
NAACL 2022 7
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
arXiv 2021
Unified Pre-training for Program Understanding and Generation
NAACL 2021 4
Socially Aware Bias Measurements for Hindi Language Representations
NAACL 2022 7
GPT-GNN: Generative Pre-Training of Graph Neural Networks
arXiv 2020
Affiliations
Frequent co-authors
10from 64 papers