0

William Yang Wang

Papers
58

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
58papers

Authored papers

58

Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

arXiv 2026

2026

TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents

arXiv 2026

2026

Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies

arXiv 2025

2026

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

arXiv 2025

2025

MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG

arXiv 2025

2025

MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

arXiv 2025

2025

InductionBench: LLMs Fail in the Simplest Complexity Class

arXiv 2025

2025

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

arXiv 2025

2025

Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement

arXiv 2024

2024

A Survey on Data Selection for Language Models

arXiv 2024

2024

RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

arXiv 2024

2024

MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding

arXiv 2024

2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

arXiv 2024

2024

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

arXiv 2024

2024

Can Editing LLMs Inject Harm?

arXiv 2024

2024

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

arXiv 2024

2024

DebUnc: Improving Large Language Model Agent Communication With Uncertainty Metrics

arXiv 2024

2024

Weak-to-Strong Jailbreaking on Large Language Models

arXiv 2024

2024

Disentangling Memory and Reasoning Ability in Large Language Models

arXiv 2024

2024

Scaling LLM Inference with Optimized Sample Compute Allocation

arXiv 2024

2024

AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

arXiv 2024

2024

BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment

arXiv 2024

2024

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

arXiv 2024

2024

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

arXiv 2024

2024

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

multimodal-c4-an-open-billion-scale-corpus-of

2023

INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

arXiv 2023

2023

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

arXiv 2023

2023

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

arXiv 2023

2023

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

arXiv 2023

2023

ReDi: Efficient Learning-Free Diffusion Inference via Trajectory Retrieval

arXiv 2023

2023

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?

arXiv 2023

2023

Guiding Instruction-based Image Editing via Multimodal Large Language Models

arXiv 2023

2023

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

arXiv 2023

2023

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

NeurIPS 2023 11

2023

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

arXiv 2023

2023

A Survey on Detection of LLMs-Generated Content

arXiv 2023

2023

Multimodal Procedural Planning via Dual Text-Image Prompting

arXiv 2023

2023

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

llmscore-unveiling-the-power-of-large

2023

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

improving-few-shot-generalization-by

2023

MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models

arXiv 2023

2023

Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought

arXiv 2023

2023

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

arXiv 2023

2023

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

CVPR 2023 1

2022

Imagination-Augmented Natural Language Understanding

NAACL 2022 7

2022

Not All Errors are Equal: Learning Text Generation Metrics using Stratified Error Synthesis

arXiv 2022

2022

Towards Large-Scale Interpretable Knowledge Graph Reasoning for Dialogue Systems

Findings (ACL) 2022 5

2022

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

arXiv 2022

2022

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

arXiv 2022

2022

FinQA: A Dataset of Numerical Reasoning over Financial Data

EMNLP 2021 11

2021

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

arXiv 2021

2021

A Dataset for Answering Time-Sensitive Questions

arXiv 2021

2021

Attacking Open-domain Question Answering by Injecting Misinformation

contraqa-question-answering-under

2021

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

ICLR 2021 1

2020

Logical Natural Language Generation from Open-Domain Tables

logical-natural-language-generation-from-open-1

2020

r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection

arXiv 2019

2019

Self-Supervised Learning for Contextualized Extractive Summarization

self-supervised-learning-for-contextualized-1

2019

TabFact: A Large-scale Dataset for Table-based Fact Verification

ICLR 2020 1

2019

Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media

arXiv 2018

2018

Affiliations

No known affiliations.

Frequent co-authors

10

from 58 papers