RLlib Development

Development Install

You can develop RLlib locally without needing to compile Ray by using the setup-dev.py script. This sets up links between the rllib dir in your git repo and the one bundled with the ray package. When using this script, make sure that your git branch is in sync with the installed Ray binaries (i.e., you are up-to-date on master and have the latest wheel installed.)

API Stability

Objects and methods annotated with @PublicAPI or @DeveloperAPI have the following API compatibility guarantees:

ray.rllib.utils.annotations.PublicAPI(obj)[source]

Annotation for documenting public APIs.

Public APIs are classes and methods exposed to end users of RLlib. You can expect these APIs to remain stable across RLlib releases.

Subclasses that inherit from a @PublicAPI base class can be assumed part of the RLlib public API as well (e.g., all trainer classes are in public API because Trainer is @PublicAPI).

In addition, you can assume all trainer configurations are part of their public API as well.

ray.rllib.utils.annotations.DeveloperAPI(obj)[source]

Annotation for documenting developer APIs.

Developer APIs are classes and methods explicitly exposed to developers for the purposes of building custom algorithms or advanced training strategies on top of RLlib internals. You can generally expect these APIs to be stable sans minor changes (but less stable than public APIs).

Subclasses that inherit from a @DeveloperAPI base class can be assumed part of the RLlib developer API as well (e.g., all policy optimizers are developer API because PolicyOptimizer is @DeveloperAPI).

Features

Feature development and upcoming priorities are tracked on the RLlib project board (note that this may not include all development efforts). For discussion of issues and new features, we use the Ray dev list and GitHub issues page.

Benchmarks

A number of training run results are available in the rl-experiments repo, and there is also a list of working hyperparameter configurations in tuned_examples. Benchmark results are extremely valuable to the community, so if you happen to have results that may be of interest, consider making a pull request to either repo.

Contributing Algorithms

These are the guidelines for merging new algorithms into RLlib:

  • Contributed algorithms (rllib/contrib):
    • must subclass Trainer and implement the _train() method
    • must include a lightweight test (example) to ensure the algorithm runs
    • should include tuned hyperparameter examples and documentation
    • should offer functionality not present in existing algorithms
  • Fully integrated algorithms (rllib/agents) have the following additional requirements:
    • must fully implement the Trainer API
    • must offer substantial new functionality not possible to add to other algorithms
    • should support custom models and preprocessors
    • should use RLlib abstractions and support distributed execution

Both integrated and contributed algorithms ship with the ray PyPI package, and are tested as part of Ray’s automated tests. The main difference between contributed and fully integrated algorithms is that the latter will be maintained by the Ray team to a much greater extent with respect to bugs and integration with RLlib features.

How to add an algorithm to contrib

It takes just two changes to add an algorithm to contrib. A minimal example can be found here. First, subclass Trainer and implement the _init and _train methods:

class RandomAgent(Trainer):
    """Policy that takes random actions and never learns."""

    _name = "RandomAgent"
    _default_config = with_common_config({
        "rollouts_per_iteration": 10,
    })

    @override(Trainer)
    def _init(self, config, env_creator):
        self.env = env_creator(config["env_config"])

    @override(Trainer)
    def _train(self):
        rewards = []
        steps = 0
        for _ in range(self.config["rollouts_per_iteration"]):
            obs = self.env.reset()
            done = False
            reward = 0.0
            while not done:
                action = self.env.action_space.sample()
                obs, r, done, info = self.env.step(action)
                reward += r
                steps += 1
            rewards.append(reward)
        return {
            "episode_reward_mean": np.mean(rewards),
            "timesteps_this_iter": steps,
        }

Second, register the trainer with a name in contrib/registry.py.

def _import_random_agent():
    from ray.rllib.contrib.random_agent.random_agent import RandomAgent
    return RandomAgent

def _import_random_agent_2():
    from ray.rllib.contrib.random_agent_2.random_agent_2 import RandomAgent2
    return RandomAgent2

CONTRIBUTED_ALGORITHMS = {
    "contrib/RandomAgent": _import_random_trainer,
    "contrib/RandomAgent2": _import_random_trainer_2,
    # ...
}

After registration, you can run and visualize training progress using rllib train:

rllib train --run=contrib/RandomAgent --env=CartPole-v0
tensorboard --logdir=~/ray_results