Suggest edit — OpenAI Research Overview

Title

Name

Note

---
title: "OpenAI Research Overview"
visibility: public
---

# OpenAI Research Overview

Category: [[technical|Technical]]

[Read the original document](https://docs.google.com/document/d/1BbHW2B6bygh8dYxrvYhGuH6x-eFgdDpdez9MZkrQntU/edit?usp=sharing&sa=D&ust=1596495076376000&usg=AOvVaw0tE_VOgmBbIzO92RfAwBvB)

---

By Jeremy Nixon [jnixon2@gmail.com]. Nov 2017.

Categories: Domain in which the paper’s innovation is novel.

1. Reinforcement Learning
   1. Multi-Agent
   2. Exploration
   3. Imitation Learning
2. Deep Learning
3. Memory
4. Program Learning
5. Representation Learning
6. Variational Inference
7. Generative Models
8. Evolution
9. Applications
   1. Security / Safety
   2. Robotics
10. Environments

1. Reinforcement Learning
   1. Multi-Agent
      1. Learning with Opponent-Learning Awareness
         1. https://arxiv.org/abs/1709.04326
      2. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
         1. https://arxiv.org/abs/1706.02275
      3. Emergence of Grounded Compositional Language in Multi-Agent Populations
         1. https://arxiv.org/abs/1703.04908
   2. Exploration
      1. Parameter Space Noise for Exploration
         1. https://arxiv.org/abs/1706.01905
      2. UCB and InfoGain Exploration via Q-Ensembles
         1. https://arxiv.org/abs/1706.01502
      3. Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
         1. https://arxiv.org/abs/1611.04717
      4. VIME: Variational Information Maximizing Exploration
         1. https://arxiv.org/abs/1605.09674
   3. Imitation Learning
      1. Third-Person Imitation Learning
         1. https://arxiv.org/abs/1703.01703
      2. One-Shot Imitation Learning
         1. https://arxiv.org/abs/1703.07326
   4. RL2: Fast Reinforcement Learning via Slow Reinforcement Learning
      1. https://arxiv.org/abs/1611.02779
   5. Teacher-Student Curriculum Learning
      1. https://arxiv.org/abs/1707.00183
   6. Equivalence Between Policy Gradients and Soft Q-Learning
      1. https://arxiv.org/abs/1704.06440
   7. Prediction and Control with Temporal Segment Models
      1. https://arxiv.org/abs/1703.04070
2. Deep Learning
   1. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
      1. https://arxiv.org/abs/1602.07868
3. Memory
   1. Hindsight Experience Replay [Also, Reinforcement Learning]
      1. https://arxiv.org/pdf/1707.01495.pdf
4. Program Learning
   1. Extensions and Limitations of the Neural GPU
      1. https://arxiv.org/abs/1611.00736
5. Representation Learning
   1. Variational Lossy Autoencoder
      1. https://arxiv.org/abs/1611.02731
6. Variational Inference
   1. Improving Variational Inference with Inverse Autoregressive Flow
      1. https://arxiv.org/abs/1606.04934
7. Generative Models
   1. Generative Adversarial Networks
      1. InfoGan: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets [Also, Representation Learning]
         1. https://arxiv.org/abs/1606.03657
      2. Improved Techniques for Training GANs
         1. https://arxiv.org/abs/1606.03498
   2. On the Quantitative Analysis of Decoder-Based Generative Models
      1. https://arxiv.org/abs/1611.04273
   3. A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy Based Models [Also Reinforcement Learning]
      1. https://arxiv.org/pdf/1611.03852.pdf
   4. PixelCNN++: Improving the Pixel CNN with Discretized Logistic Mixture Likelihood and Other Modifications
      1. https://arxiv.org/abs/1701.05517
   5. Learning to Generate Reviews and Discovering Sentiment
      1. https://arxiv.org/abs/1704.01444
8. Evolution
   1. Evolution Strategies as a Scalable Alternative to Reinforcement Learning
      1. https://arxiv.org/abs/1703.03864
9. Applications
   1. Security / Safety
      1. Deep Reinforcement Learning from Human Preferences
         1. https://arxiv.org/abs/1706.03741
      2. Concrete Problems in AI Safety
         1. https://arxiv.org/abs/1606.06565
      3. Adversarial Attacks on Neural Network Policies
         1. https://arxiv.org/abs/1702.02284
      4. Adversarial Training Methods for Semi-Supervised Text Classification
         1. https://arxiv.org/abs/1605.07725
      5. Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data
         1. https://arxiv.org/abs/1610.05755
      6. Debate Amplification
         1. https://arxiv.org/pdf/1805.00899.pdf
         2.    2. Robotics
      1. Domain Randomization for Transferring Deep NEural Networks from Simulation to the Real World
         1. https://arxiv.org/abs/1703.06907
      2. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
         1. https://arxiv.org/abs/1610.03518
10. Environments
   1. Infrastructure for Deep Learning
      1. https://blog.openai.com/infrastructure-for-deep-learning/
   2. Universe
      1. https://blog.openai.com/universe/
   3. OpenAI Gym
      1. https://arxiv.org/abs/1606.01540

OpenAI Researchers

1. Paul Christiano
2. Ryan Lowe
3. Jean Harb
4. Pieter Abbeel
5. Igor Mordatch x
6. Matthias Plappert 
7. Rein Houthooft x
8. Prafulla Dhariwal
9. Szymon Sidor
10. Richard Y. Chen
11. Xi Chen
12. Marcin Andrychowicz x
13. John Schulman
14. Alec Radford
15. Rafal Jozefowicz
16. Yan Duan
17. Bradly C. Stadie
18. Jonathan Ho
19. Jonas Schneider
20. Ilya Sutskever
21. Wojciech Zaremba
22. Rachel Fong
23. Josh Tobin
24. Alex Ray
25. Nikhil Mishra
26. Ian Goodfellow
27. Tim Salimans
28. Diederik P. Kingma
29. Andrej Karpathy
30. Yuri Burda
31. Zain Shah
32. Trevor Blackwell
33. Vicki Cheung

Salaries of top employees [Pg. 28]
Hours & Salaries of top employees [Pg. 7]
OpenAI spent 11 million in 2016, 7 million on salary. For comparison, Deepmind spend 138 million in 2016.

---

*Source: [Original Google Doc](https://docs.google.com/document/d/1BbHW2B6bygh8dYxrvYhGuH6x-eFgdDpdez9MZkrQntU/edit?usp=sharing&sa=D&ust=1596495076376000&usg=AOvVaw0tE_VOgmBbIzO92RfAwBvB)*