Policy Gradient Methods¶
This code shows how to do reinforcement learning with policy gradient methods. View the code for this example.
For an overview of Ray’s reinforcement learning library, see RLlib.
To run this example, you will need to install TensorFlow with GPU support (at
1.0.0) and a few other dependencies.
pip install gym[atari] pip install tensorflow
Then you can run the example as follows.
rllib train --env=Pong-ram-v4 --run=PPO
This will train an agent on the
Pong-ram-v4 Atari environment. You can also
try passing in the
Pong-v0 environment or the
If you wish to use a different environment, you will need to change a few lines
Current and historical training progress can be monitored by pointing TensorBoard to the log output directory as follows.
Many of the TensorBoard metrics are also printed to the console, but you might find it easier to visualize and compare between runs using the TensorBoard UI.