Ray Tune: Hyperparameter Optimization Framework

This document describes Ray Tune, a hyperparameter tuning framework for long-running tasks such as RL and deep learning training. It has the following features:

You can find the code for Ray Tune here on GitHub.

Getting Started

from ray.tune import register_trainable, grid_search, run_experiments

def my_func(config, reporter):
    import time, numpy as np
    i = 0
    while True:
        reporter(timesteps_total=i, mean_accuracy=i ** config["alpha"])
        i += config["beta"]

register_trainable("my_func", my_func)

    "my_experiment": {
        "run": "my_func",
        "resources": { "cpu": 1, "gpu": 0 },
        "stop": { "mean_accuracy": 100 },
        "config": {
            "alpha": grid_search([0.2, 0.4, 0.6]),
            "beta": grid_search([1, 2]),

This script runs a small grid search over the my_func function using Ray Tune, reporting status on the command line until the stopping condition of mean_accuracy >= 100 is reached (for metrics like _loss_ that decrease over time, specify neg_mean_loss as a condition instead):

== Status ==
Using FIFO scheduling algorithm.
Resources used: 4/8 CPUs, 0/0 GPUs
Result logdir: ~/ray_results/my_experiment
 - my_func_0_alpha=0.2,beta=1:      RUNNING [pid=6778], 209 s, 20604 ts, 7.29 acc
 - my_func_1_alpha=0.4,beta=1:      RUNNING [pid=6780], 208 s, 20522 ts, 53.1 acc
 - my_func_2_alpha=0.6,beta=1:      TERMINATED [pid=6789], 21 s, 2190 ts, 101 acc
 - my_func_3_alpha=0.2,beta=2:      RUNNING [pid=6791], 208 s, 41004 ts, 8.37 acc
 - my_func_4_alpha=0.4,beta=2:      RUNNING [pid=6800], 209 s, 41204 ts, 70.1 acc
 - my_func_5_alpha=0.6,beta=2:      TERMINATED [pid=6809], 10 s, 2164 ts, 100 acc

In order to report incremental progress, my_func periodically calls the reporter function passed in by Ray Tune to return the current timestep and other metrics as defined in ray.tune.result.TrainingResult.

Visualizing Results

Ray Tune logs trial results to a unique directory per experiment, e.g. ~/ray_results/my_experiment in the above example. The log records are compatible with a number of visualization tools:

To visualize learning in tensorboard, run:

$ pip install tensorboard
$ tensorboard --logdir=~/ray_results/my_experiment

To use rllab’s VisKit (you may have to install some dependencies), run:

$ git clone https://github.com/rll/rllab.git
$ python rllab/rllab/viskit/frontend.py ~/ray_results/my_experiment

Finally, to view the results with a parallel coordinates visualization, open ParalleCoordinatesVisualization.ipynb as follows and run its cells:

$ cd $RAY_HOME/python/ray/tune
$ jupyter-notebook ParallelCoordinatesVisualization.ipynb

Trial Variant Generation

In the above example, we specified a grid search over two parameters using the grid_search helper function. Ray Tune also supports sampling parameters from user-specified lambda functions, which can be used in combination with grid search.

The following shows grid search over two nested parameters combined with random sampling from two lambda functions. Note that the value of beta depends on the value of alpha, which is represented by referencing spec.config.alpha in the lambda function. This lets you specify conditional parameter distributions.

"config": {
    "alpha": lambda spec: np.random.uniform(100),
    "beta": lambda spec: spec.config.alpha * np.random.normal(),
    "nn_layers": [
        grid_search([16, 64, 256]),
        grid_search([16, 64, 256]),
"repeat": 10,

By default, each random variable and grid search point is sampled once. To take multiple random samples or repeat grid search runs, add repeat: N to the experiment config. E.g. in the above, "repeat": 10 repeats the 3x3 grid search 10 times, for a total of 90 trials, each with randomly sampled values of alpha and beta.

For more information on variant generation, see variant_generator.py.

Early Stopping

To reduce costs, long-running trials can often be early stopped if their initial performance is not promising. Ray Tune allows early stopping algorithms to be plugged in on top of existing grid or random searches. This can be enabled by setting the scheduler parameter of run_experiments, e.g.

run_experiments({...}, scheduler=MedianStoppingRule())

Currently we support the following early stopping algorithms, or you can write your own that implements the TrialScheduler interface:

class ray.tune.median_stopping_rule.MedianStoppingRule(time_attr='time_total_s', reward_attr='episode_reward_mean', grace_period=60.0, min_samples_required=3, hard_stop=True)

Implements the median stopping rule as described in the Vizier paper:


  • time_attr (str) – The TrainingResult attr to use for comparing time. Note that you can pass in something non-temporal such as training_iteration as a measure of progress, the only requirement is that the attribute should increase monotonically.
  • reward_attr (str) – The TrainingResult objective value attribute. As with time_attr, this may refer to any objective value that is supposed to increase with time.
  • grace_period (float) – Only stop trials at least this old in time. The units are the same as the attribute named by time_attr.
  • min_samples_required (int) – Min samples to compute median over.
  • hard_stop (bool) – If False, pauses trials instead of stopping them. When all other trials are complete, paused trials will be resumed and allowed to run FIFO.
class ray.tune.hyperband.HyperBandScheduler(time_attr='training_iteration', reward_attr='episode_reward_mean', max_t=81)

Implements the HyperBand early stopping algorithm.

HyperBandScheduler early stops trials using the HyperBand optimization algorithm. It divides trials into brackets of varying sizes, and periodically early stops low-performing trials within each bracket.

To use this implementation of HyperBand with Ray.tune, all you need to do is specify the max length of time a trial can run max_t, the time units time_attr, and the name of the reported objective value reward_attr. We automatically determine reasonable values for the other HyperBand parameters based on the given values.

For example, to limit trials to 10 minutes and early stop based on the episode_mean_reward attr, construct:

HyperBand('time_total_s', 'episode_reward_mean', 600)

See also: https://people.eecs.berkeley.edu/~kjamieson/hyperband.html

  • time_attr (str) – The TrainingResult attr to use for comparing time. Note that you can pass in something non-temporal such as training_iteration as a measure of progress, the only requirement is that the attribute should increase monotonically.
  • reward_attr (str) – The TrainingResult objective value attribute. As with time_attr, this may refer to any objective value. Stopping procedures will use this attribute.
  • max_t (int) – max time units per trial. Trials will be stopped after max_t time units (determined by time_attr) have passed. The HyperBand scheduler automatically tries to determine a reasonable number of brackets based on this.

Checkpointing support

To enable checkpoint / resume, the full Trainable API must be implemented (though as shown in the examples above, you can get away with just supplying a train(config, reporter) func if you don’t need checkpointing). Implementing this interface is required to support resource multiplexing in schedulers such as HyperBand. For example, all RLlib agents implement the Trainable API.

class ray.tune.trainable.Trainable

Interface for trainable models, functions, etc.

Implementing this interface is required to use Ray.tune’s full functionality, though you can also get away with supplying just a my_train(config, reporter) function and calling:

register_trainable("my_func", train)

to register it for use with tune. The function will be automatically converted to this interface (sans checkpoint functionality).


Restores training state from a given model checkpoint.

These checkpoints are returned from calls to save().


Saves the current model state to a checkpoint.

Returns:Checkpoint path that may be passed to restore().

Releases all resources used by this class.


Runs one logical iteration of training.

Returns:A TrainingResult that describes training progress.

Resource Allocation

Ray Tune runs each trial as a Ray actor, allocating the specified GPU and CPU resources to each actor (defaulting to 1 CPU per trial). A trial will not be scheduled unless at least that amount of resources is available in the cluster, preventing the cluster from being overloaded.

If your trainable function / class creates further Ray actors or tasks that also consume CPU / GPU resources, you will also want to set driver_cpu_limit or driver_gpu_limit to tell Ray not to assign the entire resource reservation to your top-level trainable function, as described in trial.py.

Command-line JSON/YAML API

The JSON config passed to run_experiments can also be put in a JSON or YAML file, and the experiments run using the tune.py script. This supports the same functionality as the Python API, e.g.:

cd ray/python/tune
./tune.py -f examples/tune_mnist_ray.yaml --scheduler=MedianStoppingRule

For more examples of experiments described by YAML files, see RLlib tuned examples.

Running in a large cluster

The run_experiments also takes any arguments that ray.init() does. This can be used to pass in the redis address of a multi-node Ray cluster. For more details, check out the tune.py script.