Suggest edit — ICLR2025 HAIC

Title

Name

Note

---
title: "ICLR2025 HAIC"
source: https://www.jemoka.com/posts/kbhiclr2025_haic/
---

ICLR2025 Koyejo Proposal: Focus AI measurements on the validity of specific terms.
Five pillars of claim making:
content validity: does your evaluation cover all valuable cases? criterion validity: does your evaluation correlate with a known validated standard? construct validity: does your evaluation measure the intended construct? external validity: does your evaluation generalize across different environments or settings? consequential validity: does your evaluation consider the real world impact of test interpretation and use Open problem: validaty of measurement for claims of HAIC.
ICLR2025 Evans: AI Diversity NOT Alignment for Sustained Innovation in Human-AI Evolution When AI systems aligns with user values, users rank them as more helpful.
Good For unpredictable system, the best is to build in checks and balances + diverse systems.
&ldquo;finding ways honor and value big-bad failures&mdash;to build objectives&rdquo;
ICLR2025 Laidlaw: Scalable Assistance Games fix a human model learned from data learn a model: AssistanceZero AssistanceZero Multi-agent environment to solve factored POMDPs while a human agent is doing somtehing.
ICLR2025 Musaffar: Learning to Lie: Adversarial Attacks Driven by Reinforcement Learning damage Human-AI Teams and LLMs RL driven attacks are effective to trick humans Chain of thought models are more sensitive to attacks