0

Seungone Kim

Papers
25

Cite

Notes

Only stored in your browser.

Attribution

Affiliations & profile
Semantic Scholar
Attribution policy →
25papers

Authored papers

25

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

arXiv 2026

2026

M-Prometheus: A Suite of Open Multilingual LLM Judges

arXiv 2025

2025

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

arXiv 2025

2025

Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

arXiv 2025

2025

Measuring Sycophancy of Language Models in Multi-turn Dialogues

arXiv 2025

2025

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

arXiv 2025

2025

Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

arXiv 2025

2025

Reasoning Models Better Express Their Confidence

arXiv 2025

2025

Let's Predict Sentence by Sentence

arXiv 2025

2025

Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators

arXiv 2025

2025

LangBridge: Multilingual Reasoning Without Multilingual Supervision

arXiv 2024

2024

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

arXiv 2024

2024

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

arXiv 2024

2024

Can Language Models Evaluate Human Written Text? Case Study on Korean Student Writing for Education

arXiv 2024

2024

Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation

arXiv 2024

2024

Aligning to Thousands of Preferences via System Message Generalization

arXiv 2024

2024

Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards

arXiv 2024

2024

Evaluating Language Models as Synthetic Data Generators

arXiv 2024

2024

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

arXiv 2023

2023

The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

arXiv 2023

2023

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

arXiv 2023

2023

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

arXiv 2023

2023

Exploring the Benefits of Training Expert Language Models over Instruction Tuning

arXiv 2023

2023

CoTEVer: Chain of Thought Prompting Annotation Toolkit for Explanation Verification

arXiv 2023

2023

Mind the Gap! Injecting Commonsense Knowledge for Abstractive Dialogue Summarization

COLING 2022 10

2022

Affiliations

No known affiliations.

Frequent co-authors

10

from 25 papers