Launching Ray Clusters on AWS

6 min readJan 18, 2024

To start an AWS Ray cluster, you should use the Ray cluster launcher with the AWS Python SDK.

Using the Cluster Management CLI

Install Ray cluster launcher

The Ray cluster launcher is part of the ray CLI. Use the CLI to start, stop and attach to a running ray cluster using commands such as ray up, ray down and ray attach. You can use pip to install the ray CLI with cluster launcher support. Follow the Ray installation documentation for more detailed instructions.

# install raypip install -U ray[default]

Install and Configure AWS Python SDK (Boto3)

Next, install AWS SDK using pip install -U boto3 and configure your AWS credentials following the AWS guide(opens in a new tab). Boto3 will look in several locations when searching for credentials. The mechanism in which Boto3 looks for credentials is to search through a list of possible locations and stop as soon as it finds credentials. The order in which Boto3 searches for credentials is:

Passing credentials as parameters in the boto.client() method
Passing credentials as parameters when creating a Session object
Environment variables

This is my preferred method

Shared credential file (~/.aws/credentials)
AWS config file (~/.aws/config)
Assume Role provider
Boto2 config file (/etc/boto.cfg and ~/.boto)
Instance metadata service on an Amazon EC2 instance that has an IAM role configured.

# install AWS Python SDK (boto3)pip install -U boto3

And then in the .env file:

# AWS CredentialsAWS_ACCESS_KEY_ID=...AWS_SECRET_ACCESS_KEY=...AWS_SESSION_TOKEN=...

To create, modify, or delete your own access keys, In the navigation bar in the AWS console, on the upper right, choose your user name, and then choose Security credentials. There you can create an access key. Then to retrieve a temporary credential using AWS STS:

import boto3from decouple import config  def get_aws_sts():    """    Get AWS STS credentials    """    access_key = config('AWS_ACCESS_KEY_ID')    secret_access_key = config('AWS_SECRET_ACCESS_KEY')     client = boto3.client(        'sts',        aws_access_key_id=access_key,        aws_secret_access_key=secret_access_key    )    response = client.get_session_token()    expiry = response['Credentials']['Expiration']    print(f"Credentials expire at {expiry}")    return response['Credentials']  if __name__ == '__main__':    print(get_aws_sts())

Start Ray with the Ray cluster launcher

Once Boto3 is configured to manage resources in your AWS account, you should be ready to launch your cluster using the cluster launcher. The cluster config file(opens in a new tab) provided by Ray will create a small cluster with an m5.large head node (on-demand) configured to autoscale to up to two m5.large spot-instance workers. Test that it works by running the following commands from your local machine:

# Create or update the cluster. When the command finishes, it will print# out the command that can be used to SSH into the cluster head node.ray up aws/cluster.yaml --no-config-cache # Get a remote shell on the head node.ray attach aws/cluster.yaml # Try running a Ray program.python -c 'import ray; ray.init()'exit # Tear down the cluster.ray down aws/cluster.yaml

Security

By default, Ray nodes in a Ray AWS cluster have full EC2 and S3 permissions (i.e. arn:aws:iam::aws:policy/AmazonEC2FullAccess and arn:aws:iam::aws:policy/AmazonS3FullAccess). This is a good default for trying out Ray clusters but you may want to change the permissions Ray nodes have for various reasons (e.g. to reduce the permissions for security reasons). You can do so by providing a custom IamInstanceProfile to the related node_config:

available_node_types:  ray.worker.default:    node_config:      ...      IamInstanceProfile:        Arn: arn:aws:iam::YOUR_AWS_ACCOUNT:YOUR_INSTANCE_PROFILE

Ray Serve: Kubernetes using the KubeRay RayService

For Ray Serve, it is recommended in the docs(opens in a new tab) to deploy it in production on Kubernetes, with the recommended practice to use the RayService(opens in a new tab) controller that’s provided as part of KubeRay(opens in a new tab). This setup provides the best of both worlds: the user experience and scalable compute of Ray Serve and operational benefits of Kubernetes. This also allows you to integrate with existing applications that may be running on Kubernetes. The RayService custom resource automatically handles important production requirements such as health checking, status reporting, failure recovery, and upgrades. If you’re not running on Kubernetes, you can also run Ray Serve on a Ray cluster directly using the Serve CLI. To do this, you will need to generate a Serve config file and deploy it using the Serve CLI.

A RayService Custom Resource (CR) encapsulates a multi-node Ray Cluster and a Serve application that runs on top of it into a single Kubernetes manifest. Deploying, upgrading, and getting the status of the application can be done using standard kubectl commands.

The Serve Config File

For apps we are building, in development, we would likely use the serve run command to iteratively run, develop, and repeat (see the Development Workflow(opens in a new tab) for more information). When we’re ready to go to production, we will generate a structured config file that acts as the single source of truth for the application.

You can use the Serve config with the serve deploy CLI command used to deploy on VM or embed it in a RayService custom resource in Kubernetes to deploy and update your application in production. The config is a YAML file with the following format:

proxy_location: ... http_options:  host: ...  port: ...  request_timeout_s: ...  keep_alive_timeout_s: ... grpc_options:  port: ...  grpc_servicer_functions: ... applications:- name: ...  route_prefix: ...  import_path: ...  runtime_env: ...  deployments:  - name: ...    num_replicas: ...    ...  - name:    ...

The file contains proxy_location, http_options, grpc_options, and applications. See details about each field in the docs(opens in a new tab)

We can also auto-generate this config file from the code. The serve build command takes an import path to your deployment graph and it creates a config file containing all the deployments and their settings from the graph. You can tweak these settings to manage your deployments in production.

Note that the runtime_env field will always be empty when using serve build and must be set manually. In my case, if modin or QuantLib are not installed globally, you should include these two pip packages in the runtime_env.

This config file can be generated using serve build:

serve build app.main:app -o serve_config.yaml

For me, the generated config file looks like this:

# This file was generated using the `serve build` command on Ray v2.7.0.proxy_location: EveryNodehttp_options:  host: 0.0.0.0  port: 8000 grpc_options:  port: 9000  grpc_servicer_functions: [] applications:  - name: app1    route_prefix: /    import_path: app.main:app    runtime_env: {}    deployments:      - name: PGMaster      - name: API

The generated version of this file contains an import_path, runtime_env, and configuration options for each deployment in the application. The application needs packages, so modify the runtime_env field of the generated config to include these two pip packages. Save this config locally in serve_config.yaml:

# This file was generated using the `serve build` command on Ray v2.7.0.proxy_location: EveryNodehttp_options:  host: 0.0.0.0  port: 8000 grpc_options:  port: 9000  grpc_servicer_functions: [] applications:  - name: app1    route_prefix: /    import_path: app.main:app    runtime_env:      pip:        - QuantLib        - asyncpg        - numpy        - modin    deployments:      - name: PGMaster      - name: QuadraAPI

You can use serve deploy to deploy the application to a local Ray cluster and serve status to get the status at runtime:

# Start a local Ray cluster.ray start --head# Start the application.serve deploy serve_config.yaml

And to stop the ray cluster:

# Stop the application.serve shutdown# Stop the local Ray cluster.ray stop

To update the application, modify the config file and use serve deploy again.

Deploying on Kubernetes using KubeRay

KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. Read more in the docs(opens in a new tab). It offers 3 custom resource definitions (CRDs):

RayCluster: KubeRay fully manages the lifecycle of RayCluster, including cluster creation/deletion, autoscaling, and ensuring fault tolerance.
RayJob: With RayJob, KubeRay automatically creates a RayCluster and submits a job when the cluster is ready. You can also configure RayJob to automatically delete the RayCluster once the job finishes.
RayService: RayService is made up of two parts: a RayCluster and Ray Serve deployment graphs. RayService offers zero-downtime upgrades for RayCluster and high availability.

We will deploy a Ray Serve application using a RayService.

1. Create a Kubernetes cluster with Kind

First, create a Kubernetes cluster with Kind for local development:

kind create cluster --image=kindest/node:v1.23.0

2. Install the KubeRay operator

Install the KubeRay operator(opens in a new tab) via Helm repository.

$ helm repo add kuberay https://ray-project.github.io/kuberay-helm/$ helm repo update# Install both CRDs and KubeRay operator v1.0.0-rc.0.$ helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.0# Confirm that the operator is running in the namespace `default`.$ kubectl get podsNAME                               READY   STATUS    RESTARTS   AGEkuberay-operator-68cc555c9-qc7cf   1/1     Running   0          22s