Language models are often adapted in stages: a public skill phase, a private memory phase, and a later safety phase that learns to refuse outputs tied to the remembered entities. Revoking the memory after the safety phase is not the same problem as subtracting the memory update: the later safety optimizer has transported the memory direction. We introduce process sidecars, a two-coefficient edit family \hatθ(λ,γ)=θ_{AMS}-λΔ_{M}-γ\hat{R}{S\leftarrowM}, with \hat{R}{S\leftarrowM}=\hat{J}{S,\varepsilon}(Δ{M})-Δ_{M}, where \hat{J}{S,\varepsilon} is a centered secant through the realized future AdamW safety-training process. The implementation uses \varepsilon=1 at the natural memory-edit scale; it reuses θ{AMS} as the positive endpoint and computes one additional safety trace at θ_{A}-Δ_{M}. We prove two things. First, the exact sidecar, using the true transported direction R_{S\leftarrowM} rather than the secant estimate, at (λ,γ)=(1,1) recovers the counterfactual safety-only oracle θ_{AS} up to second order; the proof treats AdamW as an augmented-state map over parameters, first moments, and second moments. Second, this process information is necessary: whenever future safety training bends the memory direction, every scalar task-arithmetic edit leaves first-order counterfactual error, while the process-sidecar edit is second-order accurate. Across three models, the validation-selected 2D edit improves held-out refusal closure over naive task arithmetic in all trials, and over the γ=λ process-JVP subfamily, the diagonal slice of the cached 2D grid, in all paired trials.
Revocable Learned State via Process Sidecars
Language models are often adapted in stages: a public skill phase, a private memory phase, and a later safety phase that learns to refuse outputs tied to the remembered entities.
- Preview

- Year
- 2026
- Hosting
- Full text hostedCC-BY-4.0
Cite
Notes
Only stored in your browser.
Attribution
- Abstract & full text
- arxiv.org/abs/2606.30788CC-BY-4.0
- TL;DR
- Semantic Scholar