Ray supports running distributed python programs with the multiprocessing.Pool API
using Ray Actors instead of local processes. This makes it easy
to scale existing applications that use
multiprocessing.Pool from a single node
to a cluster.
This API is new and may be revised in future Ray releases. If you encounter any bugs, please file an issue on GitHub.
To get started, first install Ray, then use
ray.util.multiprocessing.Pool in place of
This will start a local Ray cluster the first time you create a
distribute your tasks across it. See the Run on a Cluster section below for
instructions to run on a multi-node Ray cluster instead.
from ray.util.multiprocessing import Pool def f(index): return index pool = Pool() for result in pool.map(f, range(100)): print(result)
multiprocessing.Pool API is currently supported. Please see the
multiprocessing documentation for details.
context argument in the
Pool constructor is ignored when using Ray.
Run on a Cluster¶
This section assumes that you have a running Ray cluster. To start a Ray cluster, please refer to the cluster setup instructions.
To connect a
Pool to a running Ray cluster, you can specify the address of the
head node in one of two ways:
- By setting the
- By passing the
ray_addresskeyword argument to the
from ray.util.multiprocessing import Pool # Starts a new local Ray cluster. pool = Pool() # Connects to a running Ray cluster, with the current node as the head node. # Alternatively, set the environment variable RAY_ADDRESS="auto". pool = Pool(ray_address="auto") # Connects to a running Ray cluster, with a remote node as the head node. # Alternatively, set the environment variable RAY_ADDRESS="<ip_address>:<port>". pool = Pool(ray_address="<ip_address>:<port>")
You can also start Ray manually by calling
ray.init() (with any of its supported
configuration options) before creating a