instruction following
- Slug
instruction-following- Evals
- 17
- Tools
- 47
- Models
- 227
- Papers
- 11
Evals testing this capability
17Tools lifting evals here
47Top models on this capability
227by avg parsed score across evals here
Papers in this area
11introducesLength-Controlled AlpacaEval: A Simple Way to Debias Automatic EvaluatorsintroducesAPEX: An Expert-Authored Benchmark for Real-World Expert WorkflowsintroducesFrom Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder PipelineintroducesAya Model: An Instruction Finetuned Open-Access Multilingual Language ModelintroducesBeyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsintroducesGDPval: Evaluating AI Model Performance on Real-World Economically Valuable TasksintroducesHolistic Evaluation of Language ModelsintroducesTraining a Helpful and Harmless Assistant with Reinforcement Learning from Human FeedbackintroducesJudging LLM-as-a-Judge with MT-Bench and Chatbot Arenaintroducesτ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World DomainsintroducesTextArena: Multi-Agent Text-Based Games for LLM Evaluation
