Cite
Notes
Only stored in your browser.
Attribution
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment
arXiv 2025
Quantifying Generalization Complexity for Large Language Models
arXiv 2024
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
from 3 papers
Xiangjun Fan
Zhuokai Zhao
Chaoqi Wang
Chen Zhu
Hao Ma
Himabindu Lakkaraju
Hongyin Luo
James Glass
Jiayi Liu
Kiho Park