强化学习基础 Ⅳ: State-of-the-art 强化学习经典算法汇总 - 知乎

06/07 15:02
阅读数 3.6K
封面是OpenAI在 spinning up 中给出的分类,然而这已不足以囊括现有的SOTA算法,再次感慨AI领域发paper的速度。(然而在智能方面好像也没有推进很多,不过不积跬步无以至千里嘛)

为了让大家对 RL 的 SOTA 算法有一个直观的概念,我重新整理了一下 SOTA 算法目录,有些我已经在self-implement,有些写了相关的paper reading.

Model-free

Value-based

  1. Q-learning
    [ Paper | Code | Blog | 1992 ]
  2. Sarsa, Sarsa( \lambda )
    [ Paper | Code | Blog | 1994 ]
  3. Deep Q Network (DQN)
    [ Paper | Code | Blog | 2015 ]
  4. Double Deep Q Network
    [ Paper | Code | Blog | 2015 ]
  5. Dueling Deep Q Network
    [ Paper | Code | Blog | 2015 ]
  6. Double Dueling Deep Q Network (D3QN)
    [ No Paper | Code | Blog | 2015 ]
  7. Rainbow
    [ Paper | Code | Blog | 2017 ]
  8. Hindsight Experience Replay (HER) (也可用于DDPG)
    [ Paper | Code | Blog | 2017 ]

Policy-based

  1. Vanilla Policy Gradient / REINFORCE
    [ Paper | Code | Blog | 2000 ]
  2. Trust Region Policy Optimization (TRPO)
    [ Paper | Code | Blog | 2015 ]
  3. Proximal Policy Optimization (PPO)
    [ Paper | Code | Blog | 2017 ]

Actor-Critic

  1. Actor-Critic
    [ Paper | Code pytorch | Blog | 2000 ]
  2. Advantage Actor-Critic (A2C)
    [ No Paper | Code | Blog | 未知 ]
  3. Deep Deterministic Policy Gradient (DDPG)
    [ Paper | Code1 OpenAI | Code2 | Blog | 2015 ]
  4. Twin Delayed DDPG (TD3)
    [ Paper | Code | Blog | 2018 ]
  5. Soft Actor-Critic (SAC)
    [ Paper | Code tf | Blog | 2018 ]

Model-based

  1. Dyna
    [ Paper | Code | Blog | 1991 ]
  2. PILCO
    [ Paper | Code | Blog | 2011 ]
  3. Value Prediction Network (VPN)
    [ Paper | Code | Blog | 2018 ]
  4. Guided Policy Search (GPS)
    [ Paper | Code | Blog | 2017 ]
  5. Model-Based Value Expansion (MVE)
    [ Paper | Code | Blog | 2018 ]
  6. Stochastic Ensemble Value Expansion (STEVE)
    [ Paper | Code | Blog | 2018 ]
  7. Model-Based Policy Optimization (MBPO)
    [ Paper | Code | Blog | 2019 ]


Hierarchical RL

  1. Hierarchical DQN (h-DQN)
    [ Paper | Code Keras | Code pytorch | Blog | 2016 ]
  2. Hierarchical DDPG (h-DDPG)
    [ Paper | Code | Blog | 2017 ]
  3. Hierarchical-Actor-Critic (HAC)
    [ Paper | Code pytorch | Code TF | Blog_CN | Blog_EG | 2019 ]


Distributed Architecture

  1. Asynchronous Advantage Actor-Critic (A3C)
    [ Paper | Code pytorch | Blog | 2016 ]
  2. Distributed PPO (DPPO)
    [ Paper | Code pytorch | Blog | 2017 ]
  3. IMPALA
    [ Paper | Code | Blog | 2018 ]
  4. APE-X
    [ Paper | Code | Blog | 2018 ]
  5. Divergence-augmented Policy Optimization (DAPO)
    [ Paper | Code | Blog | 2019 ]

Multi-Agent

  1. Value-Decomposition Networks (VDN)
    [ Paper | Code | Blog | 2017 ]
  2. MADDPG
    [ Paper | Code OpenAI | Blog | 2017 ]
  3. Mean Field Multi-Agent RL
    [ Paper | Code | Blog | 2018 ]
  4. QMIX
    [ Paper | Code | Blog | 2018 ]
  5. Actor-Attention-Critic for Multi-Agent (MAAC)
    [ Paper | Code | Blog | 2018 ]

链接有误,烦请告知,不胜感激

更多算法实现见本专栏关联Github

Machine-Learning-is-ALL-You-Needgithub.com

欢迎 Watch & Star !!!!!

展开阅读全文
打赏
0
0 收藏
分享
加载中
徐昭辉 120020910653 Model-based 选题4 Guided Policy Search (GPS)
09/15 16:10
回复
举报
周资崴 120020910093 选题Deep Deterministic Policy Gradient (DDPG)
09/15 16:06
回复
举报
和家平 120260910001 选题1
09/15 13:57
回复
举报
更正:和家平 120260910001 Model-free value based 选题 3 Deep Q Network
09/15 14:06
回复
举报
更多评论
打赏
4 评论
0 收藏
0
分享
返回顶部
顶部