Seungone Kim
- Papers
- 25
Cite
Notes
Only stored in your browser.
Authored papers
25On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists
arXiv 2026
M-Prometheus: A Suite of Open Multilingual LLM Judges
arXiv 2025
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
arXiv 2025
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability
arXiv 2025
Measuring Sycophancy of Language Models in Multi-turn Dialogues
arXiv 2025
RefineBench: Evaluating Refinement Capability of Language Models via Checklists
arXiv 2025
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning
arXiv 2025
Reasoning Models Better Express Their Confidence
arXiv 2025
Let's Predict Sentence by Sentence
arXiv 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
arXiv 2025
LangBridge: Multilingual Reasoning Without Multilingual Supervision
arXiv 2024
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
arXiv 2024
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
arXiv 2024
Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education
arXiv 2024
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
arXiv 2024
Aligning to Thousands of Preferences via System Message Generalization
arXiv 2024
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards
arXiv 2024
Evaluating Language Models as Synthetic Data Generators
arXiv 2024
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
arXiv 2023
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
arXiv 2023
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
arXiv 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
arXiv 2023
Exploring the Benefits of Training Expert Language Models over Instruction Tuning
arXiv 2023
CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification
arXiv 2023
Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization
COLING 2022 10
Affiliations
Frequent co-authors
10from 25 papers