[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/course/cs120/kbhsu_cs120_oct012024.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "SU-CS120 OCT012024" source: https://www.jemoka.com/posts/kbhsu_cs120_oct012024/ date: 2024-10-01 --- specification gaming specification gaming, or reward hacking, is the phenomina where a system runs suboptimally because it exploited an underspecified part of the reward. challenges sparse rewards partial obervability dynamic rewards (and reward shifting) sim-to-real transfer is hard computational costs specification gaming AI alignment AI alignment ensures that AI systems are aligned with human values and interests. there is a spectrum of unexpected solutions: undesirable novel solutions an desirable novel solutions Problems with RLHF RLHF degrates model quality Goodharting Overfitting!! is an example of goodharting.