Ray’s shared memory is a game-changer for inter-task communication, allowing tasks to read and write directly to the same memory regions without any data copying, drastically reducing overhead.

Let’s see it in action. Imagine two tasks: one producing data and another consuming it.

Producer Task:

import ray
import numpy as np
import time

ray.init()

@ray.remote
def producer():
    data = np.random.rand(1024, 1024).astype(np.float32)
    # Ray automatically handles shared memory for Ray objects
    return data

# Get the ObjectRef for the data
data_ref = producer.remote()

# Let's simulate the producer being on one node and the consumer on another.
# In a real cluster, Ray would handle this placement.
# For demonstration, we'll just simulate the delay.
print("Producer finished, data available.")
time.sleep(5) # Simulate network latency if needed for demonstration

Consumer Task:

import ray
import numpy as np
import time

ray.init() # Re-initialize if in a separate script/process

@ray.remote
def consumer(data_ref):
    # When Ray retrieves an object, if it's a large NumPy array,
    # it will attempt to use shared memory if available and beneficial.
    data = ray.get(data_ref)
    print(f"Consumer received data of shape: {data.shape}")
    # Perform some computation on the data
    result = np.sum(data)
    print(f"Consumer computed sum: {result}")
    return result

# Assume data_ref is already obtained from the producer
# For this example, let's assume producer.remote() was called and returned data_ref
# In a real scenario, you'd pass this ref to the consumer.
# For demonstration, we'll re-create it here for a self-contained example.

@ray.remote
def producer_for_consumer():
    data = np.random.rand(1024, 1024).astype(np.float32)
    return data

data_ref_for_consumer = producer_for_consumer.remote()
consumer_result_ref = consumer.remote(data_ref_for_consumer)

# Wait for the consumer to finish
final_result = ray.get(consumer_result_ref)
print(f"Final result from consumer: {final_result}")

ray.shutdown()

When ray.get(data_ref) is called in the consumer task, Ray intelligently checks if the data is already in memory and accessible. For large NumPy arrays, Ray serializes them into a format that can be memory-mapped. If the data_ref points to data residing in shared memory on the same node (or if Ray can efficiently transfer a handle to that shared memory region to the requesting node), the consumer task can access it directly without Ray needing to deserialize and copy data from the network or disk.

The core problem Ray shared memory solves is the massive overhead of data serialization and deserialization when tasks need to exchange large amounts of data. Traditionally, if Task A produces a large NumPy array and Task B needs it, Task A serializes the array (e.g., to bytes), sends it over the network, and Task B deserializes it back into a NumPy array. This involves copying data multiple times: from the producer’s memory to the network buffer, across the network, from the network buffer to the consumer’s memory, and finally, potentially from a temporary buffer to the final NumPy array structure. For large datasets, this can become the dominant bottleneck, dwarfing the actual computation. Ray’s shared memory mechanism bypasses most of this. When a Ray object reference (ObjectRef) is created for a large, picklable object like a NumPy array, Ray can register this object in a shared memory segment. Subsequent ray.get calls for this ObjectRef on the same node (or even across nodes if the OS supports efficient shared memory mapping or Ray manages it) can then directly access this memory region. This means the data isn’t copied; the consumer’s NumPy array is essentially a view into the producer’s memory, or a shared memory segment that both can access.

The key levers you control are implicit through Ray’s object management. When you return large data structures (like NumPy arrays, Pandas DataFrames, PyTorch tensors) from a remote function, Ray automatically attempts to optimize their storage and retrieval. You don’t typically need to explicitly tell Ray "use shared memory." Ray’s internal object store and serialization layer are designed to detect large, suitable objects and place them in shared memory where possible. The size threshold at which Ray considers an object "large enough" to warrant shared memory optimization is configurable, but defaults are usually sensible. The primary interaction is simply by returning these objects from @ray.remote functions and fetching them with ray.get.

What most people don’t realize is that this zero-copy access isn’t just about avoiding serialization overhead; it’s also about avoiding the allocation of new memory for the received data. When a task ray.gets an object that’s already in shared memory, the resulting Python object (e.g., a NumPy array) can often be constructed to directly reference the shared memory buffer. This means the memory occupied by the data is accounted for once in the shared memory pool, rather than being duplicated by the receiving task. This is particularly impactful in memory-constrained environments or when dealing with massive datasets that might otherwise exceed available RAM if copied.

The next concept you’ll likely encounter is managing the lifecycle of these shared memory objects, especially in long-running applications or when dealing with very large datasets that could consume significant shared memory resources.

Want structured learning?

Take the full Ray course →