Deploying on Kubernetes¶
These instructions have not been tested extensively. If you have a suggestion for how to improve them, please open a pull request or email firstname.lastname@example.org.
You can run Ray on top of Kubernetes. This document assumes that you have access
to a Kubernetes cluster and have
kubectl installed locally.
Start by cloning the Ray repository.
git clone https://github.com/ray-project/ray.git
Work Interactively on the Cluster¶
To work interactively, first start Ray on Kubernetes.
kubectl create -f ray/kubernetes/head.yaml kubectl create -f ray/kubernetes/worker.yaml
This will start one head pod and 3 worker pods. You can check that the pods are
running by running
kubectl get pods.
You should see something like the following (you will have to wait a couple minutes for the pods to enter the “Running” state).
$ kubectl get pods NAME READY STATUS RESTARTS AGE ray-head-5455bb66c9-6bxvz 1/1 Running 0 10s ray-worker-5c49b7cc57-c6xs8 1/1 Running 0 5s ray-worker-5c49b7cc57-d9m86 1/1 Running 0 5s ray-worker-5c49b7cc57-kzk4s 1/1 Running 0 5s
To run tasks interactively on the cluster, connect to one of the pods, e.g.,
kubectl exec -it ray-head-5455bb66c9-6bxvz -- bash
Start an IPython interpreter, e.g.,
from collections import Counter import time import ray # Note that if you run this script on a non-head node, then you must replace # "localhost" with socket.gethostbyname("ray-head"). ray.init(redis_address="localhost:6379") @ray.remote def f(x): time.sleep(0.01) return x + (ray.services.get_node_ip_address(), ) # Check that objects can be transferred from each node to each other node. %time Counter(ray.get([f.remote(f.remote(())) for _ in range(1000)]))
Submitting a Script to the Cluster¶
To submit a self-contained Ray application to your Kubernetes cluster, do the following.
kubectl create -f ray/kubernetes/submit.yaml
One of the pods will download and run this example script.
The script prints its output. To view the output, first find the pod name by
kubectl get all. You’ll see output like the following.
$ kubectl get all NAME READY STATUS RESTARTS AGE pod/ray-head-5486648dc9-c6hz2 1/1 Running 0 11s pod/ray-worker-5c49b7cc57-2jz4l 1/1 Running 0 11s pod/ray-worker-5c49b7cc57-8nwjk 1/1 Running 0 11s pod/ray-worker-5c49b7cc57-xlksn 1/1 Running 0 11s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/ray-head ClusterIP 10.110.54.241 <none> 6379/TCP,6380/TCP,6381/TCP,12345/TCP,12346/TCP 11s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/ray-head 1/1 1 1 11s deployment.apps/ray-worker 3/3 3 3 11s NAME DESIRED CURRENT READY AGE replicaset.apps/ray-head-5486648dc9 1 1 1 11s replicaset.apps/ray-worker-5c49b7cc57 3 3 3 11s
Find the name of the
ray-head pod and run the equivalent of
kubectl logs ray-head-5486648dc9-c6hz2
To remove the services you have created, run the following.
kubectl delete service/ray-head \ deployment.apps/ray-head \ deployment.apps/ray-worker
You will probably need to do some amount of customization.
- The example above uses the Docker image
rayproject/examples, which is built using these Dockerfiles. You will most likely need to use your own Docker image.
- You will need to modify the
argsfields to potentially install and run the script of your choice.
- You will need to customize the resource requests.
The following are also important but haven’t been documented yet. Contributions are welcome!
- Request CPU/GPU/memory resources.
- Increase shared memory.
- How to make Kubernetes clean itself up once the script finishes.
- Follow Kubernetes best practices.