Tianyi Zhou
- Papers
- 71
Cite
Notes
Only stored in your browser.
Authored papers
71ClawEnvKit: Automatic Environment Generation for Claw-Like Agents
arXiv 2026
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
arXiv 2026
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models
arXiv 2026
Does Socialization Emerge in AI Agent Society? A Case Study of Moltbook
arXiv 2026
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
arXiv 2026
What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis
arXiv 2026
When AI Navigates the Fog of War
arXiv 2026
TradeTrap: Are LLM-based Trading Agents Truly Reliable and Faithful?
arXiv 2025
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset
arXiv 2025
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
arXiv 2025
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
arXiv 2025
CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
arXiv 2025
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients
arXiv 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
arXiv 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
arXiv 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
arXiv 2025
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models
arXiv 2025
V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions
arXiv 2025
Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction
arXiv 2025
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
arXiv 2025
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
arXiv 2025
ChartAB: A Benchmark for Chart Grounding & Dense Alignment
arXiv 2025
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
arXiv 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
arXiv 2025
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness
arXiv 2025
FaSTA$^*$: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
arXiv 2025
Optimizing Length Compression in Large Reasoning Models
arXiv 2025
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing
arXiv 2025
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
arXiv 2025
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
arXiv 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
arXiv 2025
PharmaShip: An Entity-Centric, Reading-Order-Supervised Benchmark for Chinese Pharmaceutical Shipping Documents
arXiv 2025
TrustLLM: Trustworthiness in Large Language Models
arXiv 2024
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
CVPR 2025 1
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion
arXiv 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
arXiv 2024
AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models
arXiv 2024
Mosaic-IT: Free Compositional Data Augmentation Improves Instruction Tuning
arXiv 2024
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion
arXiv 2024
Meta-Task Prompting Elicits Embeddings from Large Language Models
arXiv 2024
Corpus-Steered Query Expansion with Large Language Models
arXiv 2024
BenTo: Benchmark Task Reduction with In-Context Transferability
arXiv 2024
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
arXiv 2024
A Survey on Knowledge Distillation of Large Language Models
arXiv 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
arXiv 2024
Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
arXiv 2024
One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts
arXiv 2024
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
arXiv 2024
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
arXiv 2024
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models
arXiv 2024
Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation
arXiv 2024
DynaSaur: Large Language Agents Beyond Predefined Actions
arXiv 2024
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
arXiv 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
arXiv 2023
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
arXiv 2023
Merging Experts into One: Improving Computational Efficiency of Mixture of Experts
arXiv 2023
Federated Recommendation with Additive Personalization
arXiv 2023
Do text-free diffusion models learn discriminative visual representations?
arXiv 2023
Continual Task Allocation in Meta-Policy Network via Sparse Prompting
arXiv 2023
When to Learn What: Model-Adaptive Data Augmentation Curriculum
ICCV 2023 1
Subclass-balancing Contrastive Learning for Long-tailed Recognition
ICCV 2023 1
Good Questions Help Zero-Shot Image Reasoning
arXiv 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
arXiv 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
arXiv 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
arXiv 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
CVPR 2024 1
InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
arXiv 2023
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
CVPR 2024 1
Structured Cooperative Learning with Graphical Model Priors
arXiv 2023
TASA: Deceiving Question Answering Models by Twin Answer Sentences Attack
arXiv 2022
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly
autolrs-automatic-learning-rate-schedule-by
Affiliations
Frequent co-authors
10from 71 papers