Policy Gradient MethodsΒΆ

This code shows how to do reinforcement learning with policy gradient methods. View the code for this example.

To run this example, you will need to install TensorFlow with GPU support (at least version 1.0.0) and a few other dependencies.

pip install gym[atari]
pip install tensorflow

Then you can run the example as follows.

python/ray/rllib/train.py --env=Pong-ram-v4 --alg=PPO

This will train an agent on the Pong-ram-v4 Atari environment. You can also try passing in the Pong-v0 environment or the CartPole-v0 environment. If you wish to use a different environment, you will need to change a few lines in example.py.

Current and historical training progress can be monitored by pointing TensorBoard to the log output directory as follows.

tensorboard --logdir=/tmp/ray

Many of the TensorBoard metrics are also printed to the console, but you might find it easier to visualize and compare between runs using the TensorBoard UI.