arxiv:2605.16339
Shunchang Liu
Shunchang
AI & ML interests
AI
Recent Activity
authored a paper 13 days ago
Preference Instability in Reward Models: Detection and Mitigation via Sparse Autoencoders updated a model 18 days ago
Shunchang/sae-rm-checkpoints updated a dataset about 2 months ago
Shunchang/sae-rm-perturbation-dataOrganizations
None yet