ray.rllib.algorithms.algorithm_config.AlgorithmConfig.environment#

AlgorithmConfig.environment(env: str | ~typing.Any | gymnasium.Env | None = <ray.rllib.utils.from_config._NotProvided object>, *, env_config: dict | None = <ray.rllib.utils.from_config._NotProvided object>, observation_space: gymnasium.spaces.Space | None = <ray.rllib.utils.from_config._NotProvided object>, action_space: gymnasium.spaces.Space | None = <ray.rllib.utils.from_config._NotProvided object>, env_task_fn: ~typing.Callable[[dict | NestedDict, ~typing.Any | gymnasium.Env, ~ray.rllib.env.env_context.EnvContext], ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, render_env: bool | None = <ray.rllib.utils.from_config._NotProvided object>, clip_rewards: bool | float | None = <ray.rllib.utils.from_config._NotProvided object>, normalize_actions: bool | None = <ray.rllib.utils.from_config._NotProvided object>, clip_actions: bool | None = <ray.rllib.utils.from_config._NotProvided object>, is_atari: bool | None = <ray.rllib.utils.from_config._NotProvided object>, action_mask_key: str | None = <ray.rllib.utils.from_config._NotProvided object>, auto_wrap_old_gym_envs=-1, disable_env_checking=-1) → AlgorithmConfig[source]#

Sets the config’s RL-environment settings.

Parameters:

env – The environment specifier. This can either be a tune-registered env, via tune.register_env([name], lambda env_ctx: [env object]), or a string specifier of an RLlib supported type. In the latter case, RLlib will try to interpret the specifier as either an Farama-Foundation gymnasium env, a PyBullet env, or a fully qualified classpath to an Env class, e.g. “ray.rllib.examples.envs.classes.random_env.RandomEnv”.
env_config – Arguments dict passed to the env creator as an EnvContext object (which is a dict plus the properties: num_env_runners, worker_index, vector_index, and remote).
observation_space – The observation space for the Policies of this Algorithm.
action_space – The action space for the Policies of this Algorithm.
env_task_fn – A callable taking the last train results, the base env and the env context as args and returning a new task to set the env to. The env must be a TaskSettableEnv sub-class for this to work. See examples/curriculum_learning.py for an example.
render_env – If True, try to render the environment on the local worker or on worker 1 (if num_env_runners > 0). For vectorized envs, this usually means that only the first sub-environment will be rendered. In order for this to work, your env will have to implement the render() method which either: a) handles window generation and rendering itself (returning True) or b) returns a numpy uint8 image of shape [height x width x 3 (RGB)].
clip_rewards – Whether to clip rewards during Policy’s postprocessing. None (default): Clip for Atari only (r=sign(r)). True: r=sign(r): Fixed rewards -1.0, 1.0, or 0.0. False: Never clip. [float value]: Clip at -value and + value. Tuple[value1, value2]: Clip at value1 and value2.
normalize_actions – If True, RLlib will learn entirely inside a normalized action space (0.0 centered with small stddev; only affecting Box components). RLlib will unsquash actions (and clip, just in case) to the bounds of the env’s action space before sending actions back to the env.
clip_actions – If True, the RLlib default ModuleToEnv connector will clip actions according to the env’s bounds (before sending them into the env.step() call).
is_atari – This config can be used to explicitly specify whether the env is an Atari env or not. If not specified, RLlib will try to auto-detect this.
action_mask_key – If observation is a dictionary, expect the value by the key action_mask_key to contain a valid actions mask (numpy.int8 array of zeros and ones). Defaults to “action_mask”.

Returns:

This updated AlgorithmConfig object.