Yue Huang

Emergent Social Intelligence Risks in Generative Multi-Agent Systems

arXiv 2026

Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models

arXiv 2026

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

arXiv 2026

Generative AI for Autonomous Driving: Frontiers and Opportunities

arXiv 2025

DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

arXiv 2025

Building a Foundational Guardrail for General Agentic Systems via Synthetic Data

arXiv 2025

Preference Leakage: A Contamination Problem in LLM-as-a-judge

arXiv 2025

On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective

arXiv 2025

EfficientLLM: Efficiency in Large Language Models

arXiv 2025

TrustLLM: Trustworthiness in Large Language Models

arXiv 2024

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

arXiv 2024

GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented Understanding

arXiv 2024

UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

arXiv 2024

AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?

arXiv 2024

Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment

arXiv 2024

HonestLLM: Toward an Honest and Helpful Large Language Model

arXiv 2024

LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?

arXiv 2024

CSPRD: A Financial Policy Retrieval Dataset for Chinese Stock Market

arXiv 2023

AlignBench: Benchmarking Chinese Alignment of Large Language Models

arXiv 2023

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

arXiv 2023

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4

arXiv 2023