Koray Kavukcuoglu, David Silver, Tim Harley, Timothy P. Lillicrap, Alex Graves, Mehdi Mirza, Adri Puigdomnech Badia, Volodymyr Mnih - 2016
Publications: arXiv Add/Edit
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.
Implementation of Attentive Multi Task Deep Reinforcement Learning Architecture in Tensorflow
Asynchronous Advantage Actor-Critic (A3C) Algorithms implemented in TensorFlow 1.3
Pytorch LSTM RNN for reinforcement learning to play Atari games from OpenAI Universe. We also use Google Deep Mind's Asynchronous Advantage Actor-Critic (A3C) Algorithm. This is much superior and efficient than DQN and obsoletes it. Can play on many games
This repo demonstrates the usage of an actor-critic setup via the deep-deterministic-policy-gradients algorithm. The environment to be solved is the Unity Reacher Environment provided in the Udacity Deep Reinforcement Learning nanodegree