[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jemoka / Jemoka Knowledge Base / raw/paper/iclr2025/kbhiclr2025_haic.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "ICLR2025 HAIC" source: https://www.jemoka.com/posts/kbhiclr2025_haic/ --- ICLR2025 Koyejo Proposal: Focus AI measurements on the validity of specific terms. Five pillars of claim making: content validity: does your evaluation cover all valuable cases? criterion validity: does your evaluation correlate with a known validated standard? construct validity: does your evaluation measure the intended construct? external validity: does your evaluation generalize across different environments or settings? consequential validity: does your evaluation consider the real world impact of test interpretation and use Open problem: validaty of measurement for claims of HAIC. ICLR2025 Evans: AI Diversity NOT Alignment for Sustained Innovation in Human-AI Evolution When AI systems aligns with user values, users rank them as more helpful. Good For unpredictable system, the best is to build in checks and balances + diverse systems. “finding ways honor and value big-bad failures—to build objectives” ICLR2025 Laidlaw: Scalable Assistance Games fix a human model learned from data learn a model: AssistanceZero AssistanceZero Multi-agent environment to solve factored POMDPs while a human agent is doing somtehing. ICLR2025 Musaffar: Learning to Lie: Adversarial Attacks Driven by Reinforcement Learning damage Human-AI Teams and LLMs RL driven attacks are effective to trick humans Chain of thought models are more sensitive to attacks