Suggest edit — Reliable RL

Title

Name

Note

---
title: "Reliable RL"
type: concept
source: https://www.jemoka.com/posts/kbhreliable_rl/
confidence: high
status: active
---

Thinking about advances in the capabilities of RL: Knowledge Discovery -&gt; Reasoning (programming assistance) -&gt;(ongoing)-&gt; Robotics
Insight: as time goes on, the &ldquo;risk-criticality&rdquo; of our applications increase; yet, as risk critical scenarios increase, its harder to get data.
Reliable Feedback Loop General desirable structure&hellip;
Verify (claims and requirements) =&gt; Safeguard (safe continuous deployment) =&gt; Generalize (via compositional generalization&mdash;incrementing adding behavior without loosing behavior) =&gt; Verify =&gt; &hellip;
Deal with Stochasticity An RL algorithm is explicable, if, WHP, running on the same MDP with fixed randomness results in the same outcomes.
=&gt; \(\epsilon\) optimal replicable algorithms for tabular / linear settings with sample complexity polynomial i parameters.
Quantization for Tie Break
Compositional Generalization We can decompose relevant problems into subparts, and thus allowing us to compose them together into solving new task.