Ray Projects (Experimental)

Ray projects make it easy to package a Ray application so it can be rerun later in the same environment. They allow for the sharing and reliable reuse of existing code.

Quick start (CLI)

# Creates a project in the current directory. It will create a
# project.yaml defining the code and environment and a cluster.yaml
# describing the cluster configuration. Both will be created in the
# ray-project subdirectory of the current directory.
$ ray project create <project-name>

# Create a new session from the given project.  Launch a cluster and run
# the command, which must be specified in the project.yaml file. If no
# command is specified, the "default" command in ray-project/project.yaml
# will be used. Alternatively, use --shell to run a raw shell command.
$ ray session start <command-name> [arguments] [--shell]

# Open a console for the given session.
$ ray session attach

# Stop the given session and terminate all of its worker nodes.
$ ray session stop

Examples

See the readme for instructions on how to run these examples:

  • Open Tacotron: A TensorFlow implementation of Google’s Tacotron speech synthesis with pre-trained model (unofficial)
  • PyTorch Transformers: A library of state-of-the-art pretrained models for Natural Language Processing (NLP)

Tutorial

We will walk through how to use projects by executing the streaming MapReduce example. Commands always apply to the project in the current directory. Let us switch into the project directory with

cd ray/doc/examples/streaming

A session represents a running instance of a project. Let’s start one with

ray session start

The ray session start command will bring up a new cluster and initialize the environment of the cluster according to the environment section of the project.yaml, installing all dependencies of the project.

Now we can execute a command in the session. To see a list of all available commands of the project, run

ray session commands

which produces the following output:

As you see, in this project there is only a single run command which has arguments --num-mappers and --num-reducers. We can execute the streaming wordcount with the default parameters by running

ray session execute run

You can interrupt the command with <Control>-c and attach to the running session by executing

ray session attach --tmux

Inside the session you can for example edit the streaming applications with

cd ray-example-streaming
emacs streaming.py

Try for example to add the following lines after the for count in counts: loop:

if "million" in wordcounts:
  print("Found the word!")

and re-run the application from outside the session with

ray session execute run

The session can be terminated from outside the session with

ray session stop

Project file format (project.yaml)

A project file contains everything required to run a project. This includes a cluster configuration, the environment and dependencies for the application, and the specific inputs used to run the project.

Here is an example for a minimal project format:

name: test-project
description: "This is a simple test project"
repo: https://github.com/ray-project/ray

# Cluster to be instantiated by default when starting the project.
cluster:
  config: ray-project/cluster.yaml

# Commands/information to build the environment, once the cluster is
# instantiated. This can include the versions of python libraries etc.
# It can be specified as a Python requirements.txt, a conda environment,
# a Dockerfile, or a shell script to run to set up the libraries.
environment:
  requirements: requirements.txt

# List of commands that can be executed once the cluster is instantiated
# and the environment is set up.
# A command can also specify a cluster that overwrites the default cluster.
commands:
  - name: default
    command: python default.py
    help: "The command that will be executed if no command name is specified"
  - name: test
    command: python test.py --param1={{param1}} --param2={{param2}}
    help: "A test command"
    params:
      - name: "param1"
        help: "The first parameter"
        # The following line indicates possible values this parameter can take.
        choices: ["1", "2"]
      - name: "param2"
        help: "The second parameter"

Project files have to adhere to the following schema:

type object
properties
  • name
The name of the project
type string
  • description
A short description of the project
type string
  • repo
The URL of the repo this project is part of
type string
  • documentation
Link to the documentation of this project
type string
  • tags
Relevant tags for this project
type array
items
type string
  • cluster
type object
properties
  • config
Path to a .yaml cluster configuration file (relative to the project root)
type string
  • params
type array
items
type object
properties
  • name
type string
  • help
type string
  • choices
type array
  • default
  • type
type string
enum int, float, str
additionalProperties False
additionalProperties False
  • environment
The environment that needs to be set up to run the project
type object
properties
  • dockerimage
URL to a docker image that can be pulled to run the project in
type string
  • dockerfile
Path to a Dockerfile to set up an image the project can run in (relative to the project root)
type string
  • requirements
Path to a Python requirements.txt file to set up project dependencies (relative to the project root)
type string
  • shell
A sequence of shell commands to run to set up the project environment
type array
items
type string
additionalProperties False
  • commands
type array
items
Possible commands to run to start a session
type object
properties
  • name
Name of the command
type string
  • help
Help string for the command
type string
  • command
Shell command to run on the cluster
type string
  • params
type array
items
Possible parameters in the command
type object
properties
  • name
Name of the parameter
type string
  • help
Help string for the parameter
type string
  • choices
Possible values the parameter can take
type array
  • default
  • type
Required type for the parameter
type string
enum int, float, str
additionalProperties False
  • config
type object
additionalProperties False
  • output_files
additionalProperties False

Cluster file format (cluster.yaml)

This is the same as for the autoscaler, see Cluster Launch page.