0

Zhiwei Liu

Papers
20

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
20papers

Authored papers

20

CHI-Bench: Can AI Agents Automate End-to-End, Long-Horizon, Policy-Rich Healthcare Workflows?

arXiv 2026

2026

Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

arXiv 2026

2026

ActionStudio: A Lightweight Framework for Data and Training of Large Action Models

arXiv 2025

2025

LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering

arXiv 2025

2025

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation

arXiv 2025

2025

MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs

arXiv 2025

2025

LoCoBench-Agent: An Interactive Benchmark for LLM Agents in Long-Context Software Engineering

arXiv 2025

2025

MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment

arXiv 2025

2025

MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models

arXiv 2025

2025

Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels

arXiv 2025

2025

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning

arXiv 2025

2025

UserBench: An Interactive Gym Environment for User-Centric Agents

arXiv 2025

2025

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

arXiv 2024

2024

EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis

arXiv 2024

2024

AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System

arXiv 2024

2024

TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action

arXiv 2024

2024

ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model

arXiv 2024

2024

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

arXiv 2023

2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

arXiv 2023

2023

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

arXiv 2023

2023

Affiliations

No known affiliations.

Frequent co-authors

10

from 20 papers