Ray’s multi-tenancy, when you’re trying to isolate resources between teams, isn’t about strict, hard boundaries like different Kubernetes namespaces. Instead, it’s a more fluid system of resource reservations and scheduling priorities that allows you to give teams guaranteed slices of your cluster’s compute.
Let’s see this in action. Imagine you have a Ray cluster with 100 CPU cores. You want to give your "Data Science" team 60 cores and your "ML Engineering" team 40 cores.
Here’s how you’d configure the Ray cluster to reflect this. This is typically done when you launch your Ray cluster, often via a configuration file or command-line arguments to ray start.
# Example ray.yaml for a cluster with resource reservations
cluster_name: "my-ray-cluster"
head_node:
resources:
# Head node can also have resources reserved
custom_resources: {"cpu": 4}
worker_nodes:
- node_id: "worker-1"
resources:
custom_resources: {"cpu": 16}
- node_id: "worker-2"
resources:
custom_resources: {"cpu": 16}
- node_id: "worker-3"
resources:
custom_resources: {"cpu": 16}
- node_id: "worker-4"
resources:
custom_resources: {"cpu": 16}
- node_id: "worker-5"
resources:
custom_resources: {"cpu": 16}
- node_id: "worker-6"
resources:
custom_resources: {"cpu": 16}
# This is where the multi-tenancy magic happens
ray_params:
# Define custom resource names that represent your teams/tenants
# These are arbitrary strings, but should be descriptive.
# These names will be used when requesting resources for tasks/actors.
resource_names:
- "team:data_science"
- "team:ml_engineering"
# Resource reservations for tenants.
# This tells Ray to reserve a portion of the cluster's capacity for specific tenants.
# The sum of reserved resources for all tenants should not exceed the total cluster capacity.
# If a tenant requests more than its reserved resources, it will be queued.
resource_reservations:
"team:data_science":
cpu: 60 # Reserve 60 CPU cores for the Data Science team
"team:ml_engineering":
cpu: 40 # Reserve 40 CPU cores for the ML Engineering team
# Optional: Define resource priorities. Higher numbers mean higher priority.
# This is useful if you have overlapping reservations or want to ensure critical tasks
# get scheduled first.
resource_priorities:
"team:data_science": 10
"team:ml_engineering": 5
# Other Ray configurations...
When a task or actor is launched, you specify which tenant it belongs to by requesting the custom resource. For example, a Data Science team member might launch a job like this:
import ray
# Connect to the existing Ray cluster
ray.init(address="auto")
@ray.remote
def my_data_science_task():
# This task will only run on resources reserved for "team:data_science"
return "Hello from Data Science!"
# Request resources for the "team:data_science" tenant
# The 'resources' argument specifies the custom resource name.
# Ray will ensure this task only gets scheduled on nodes that have been
# implicitly or explicitly allocated to this tenant via resource reservations.
ray.get(my_data_science_task.options(resources={"team:data_science": 1}).remote())
The core problem Ray multi-tenancy solves is resource contention and fair sharing in a shared cluster. Without it, a few heavy-duty jobs from one team could starve out jobs from other teams, leading to unpredictable performance and long wait times. By defining resource_names and resource_reservations, you’re essentially carving up the cluster’s capacity into logical pools.
Internally, Ray’s scheduler uses these custom resources to track available capacity. When a task or actor requests {"team:data_science": 1}, the scheduler looks for available nodes that have been designated (through reservations) as belonging to the team:data_science pool and have at least 1 CPU free. If the total requested resources for team:data_science exceed its reservation, new tasks for that tenant will be queued until resources become available within its reserved slice or until another tenant releases resources it was using.
The exact levers you control are:
resource_names: These are the arbitrary strings you invent to represent your tenants (e.g.,"team:engineering","project:alpha","user:jane"). These are the keys you’ll use when requesting resources.resource_reservations: This is the crucial part. It’s a dictionary mapping yourresource_namesto specific amounts of cluster resources (CPU, GPU, custom resources). This tells Ray, "This much capacity is guaranteed for this tenant." The sum of all reservations cannot exceed the total physical capacity of the cluster.resource_priorities: For tenants whose reservations might overlap (though this is less common with strict reservations) or when one tenant’s reserved resources are temporarily freed up, priorities dictate which tenant gets first dibs on that freed capacity. Higher numbers mean higher priority.
A subtle but powerful aspect is that Ray doesn’t strictly pin resources to specific physical nodes for a tenant. Instead, it manages a pool of available custom resources. When you reserve 60 CPUs for team:data_science, Ray knows that up to 60 CPUs can be allocated to tasks requesting team:data_science. If a worker node with 16 CPUs is available and a team:data_science task needs 1 CPU, it can grab it. If another team:data_science task needs 16 CPUs, it can take the whole node’s CPU capacity as long as it’s within the 60-CPU reservation. This dynamic allocation within the reserved pool is key to efficient utilization.
When you define resource_reservations, Ray automatically adjusts the available_resources it reports for the cluster. If your cluster has 100 CPUs and you reserve 60 for team:data_science and 40 for team:ml_engineering, the scheduler will see that only 0 CPUs are available for unreserved tasks, and that team:data_science has 60 CPUs it can draw from, and team:ml_engineering has 40.
The next hurdle you’ll likely encounter is managing the lifecycle of these tenant-specific resources, especially when dealing with actors that hold resources for extended periods.