Cite
Notes
Only stored in your browser.
Attribution
L0-Reasoning Bench: Evaluating Procedural Correctness in Language Models via Simple Program Execution
arXiv 2025
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
arXiv 2024
from 2 papers
Boris Ginsburg
Cheng-Ping Hsieh
Faisal Ladhak
Isabelle Augenstein
Santiago Akle Serano
Simeng Sun
Zhaoqi Liu