Ray Placement Groups are the secret sauce for ensuring your distributed Ray tasks and actors actually run where you want them to, which is crucial for performance.

Let’s see it in action. Imagine you have a heavy data loading task and a compute task that needs that data. You want them on the same node to avoid network transfer.

import ray
import time

@ray.remote
def load_data(data_size_gb):
    print(f"Loading {data_size_gb}GB of data...")
    time.sleep(5) # Simulate data loading
    data = {"data": [i for i in range(data_size_gb * 1024 * 1024)]} # Placeholder for actual data
    print("Data loaded.")
    return data

@ray.remote
def process_data(data):
    print("Processing data...")
    result = sum(data["data"]) # Simulate processing
    print("Data processed.")
    return result

# Initialize Ray
ray.init(ignore_reinit_error=True)

# Define a placement group
# Bundle: A specification of resources required by a set of tasks/actors.
# We need 1 CPU for the loader and 1 CPU for the processor.
bundles = [{"CPU": 1}, {"CPU": 1}]
pg = ray.util.placement_group(bundles, strategy="STRICT_PACK")

# Wait for the placement group to be created. This is important!
ray.get(pg.ready())

# Launch tasks within the placement group
# We pass the placement group object to the task/actor.
# Ray will schedule these onto nodes that satisfy the placement group's bundles.
loader_ref = load_data.options(placement_group=pg, placement_group_bundle_index=0).remote(1)
processor_ref = process_data.options(placement_group=pg, placement_group_bundle_index=1).remote(ray.get(loader_ref))

# Fetch results
final_result = ray.get(processor_ref)
print(f"Final result: {final_result}")

ray.shutdown()

This code demonstrates how to create a placement group with two bundles, each requesting one CPU. We then launch load_data and process_data tasks, assigning them to specific bundles within the group. ray.get(pg.ready()) is critical; it ensures the underlying resources for the placement group are allocated before tasks are scheduled. The STRICT_PACK strategy tells Ray to put all bundles on the same node if possible. If that’s not possible, the placement group creation will fail.

The core problem placement groups solve is resource locality. In a distributed system, tasks and actors need resources (CPUs, GPUs, memory). Without explicit control, Ray’s scheduler might place a task on a node far away from its data or other dependent tasks, leading to network latency, increased resource contention, and slower execution. Placement groups allow you to dictate where related workloads should reside.

Internally, a placement group is a collection of "bundles." Each bundle is a request for a specific set of resources (e.g., {"CPU": 1, "GPU": 1}). When you create a placement group, Ray tries to find a set of nodes that can satisfy all the bundles according to the specified strategy. STRICT_PACK is the most common strategy, aiming to co-locate everything on a single node. Other strategies include STRICT_SPREAD (distribute across nodes) and PACK (try to pack, but allow spreading if necessary).

When you launch a task or actor, you can specify which placement group it belongs to and which bundle within that group it should use. This is done via .options(placement_group=pg, placement_group_bundle_index=i). The placement_group_bundle_index tells Ray which specific bundle in the group this task/actor should occupy. If you don’t specify an index, Ray will pick an available bundle.

The ray.get(pg.ready()) call is essential because it’s a blocking operation that waits until the underlying resources for the placement group have been acquired and allocated by the Ray cluster. If you try to schedule tasks onto a placement group before it’s ready, those tasks might not be scheduled correctly or might end up on nodes outside the intended group.

A subtle but powerful aspect is how placement groups interact with actors. You can create actors that are themselves part of a placement group. This is useful for building distributed systems where multiple actors need to communicate frequently and efficiently. For example, a set of worker actors and a central manager actor could all be placed within the same group to minimize communication overhead.

The one thing most people don’t realize is that placement groups are dynamic entities. You can destroy them, and Ray will release the resources. You can also create placement groups with zero bundles, which is a valid way to reserve a node for future, dynamically created tasks or actors that will reference that group.

The next step in managing distributed workloads is understanding how to dynamically adjust placement group sizes or strategies based on runtime conditions.

Want structured learning?

Take the full Ray course →