The most surprising thing about Ray’s timeline profiling is that it visualizes potential parallelism, not just what actually happened.
Let’s see it in action. Imagine a simple Ray job that launches two tasks: task_a and task_b, each taking a second.
import ray
import time
@ray.remote
def task_a():
time.sleep(1)
return "a done"
@ray.remote
def task_b():
time.sleep(1)
return "b done"
if __name__ == "__main__":
ray.init(num_cpus=2)
obj_a = task_a.remote()
obj_b = task_b.remote()
results = ray.get([obj_a, obj_b])
print(results)
ray.shutdown()
When you run this script, Ray’s scheduler tries to run task_a and task_b concurrently because you’ve specified num_cpus=2. To see this, we’ll use the Ray Dashboard. After starting your script, open your browser to http://127.0.0.1:8265 (the default Ray Dashboard address). Navigate to the "Trace" tab. You’ll see a timeline like this:
[Timeline View - Simplified Representation]
Time (s) | 0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | 1.0 | 1.1 | 1.2 | 1.3 | 1.4 | 1.5 | 1.6 | 1.7 | 1.8 | 1.9 | 2.0
---------|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
CPU 0 | | | | | | | | | | | T_a | T_a | T_a | T_a | T_a | T_a | T_a | T_a | T_a | T_a |
CPU 1 | | | | | | | | | | | T_b | T_b | T_b | T_b | T_b | T_b | T_b | T_b | T_b | T_b |
Here, T_a and T_b represent the execution of task_a and task_b respectively. You can see they start almost simultaneously on different CPUs, demonstrating parallelism. The timeline shows the entire duration of the task, from when it was scheduled to when it finished.
This timeline view is powered by distributed tracing, specifically using the OpenTelemetry standard under the hood. Ray instruments its internal components and user-defined tasks to emit trace events. These events are collected by a collector (often integrated into the Ray dashboard or a separate service) and then visualized.
The core problem Ray’s profiling solves is understanding the execution flow and resource utilization in a distributed system. In a single-process application, you’d use a profiler to see function call times. In Ray, tasks run on different machines, communicate asynchronously, and depend on each other. The timeline visualizes:
- Task Execution: When each task started, how long it ran, and on which core.
- Dependencies: You can see how tasks wait for results from other tasks (represented by gaps or specific "fetch" events).
- Resource Allocation: Which CPUs or GPUs are busy and when.
- Bottlenecks: Gaps in execution or long task durations on busy resources point to performance issues.
To control what gets traced, you use the ray.init() arguments. For example, ray.init(tracing_backend="otel", tracing_sampling_rate=1.0) enables OpenTelemetry tracing and samples 100% of traces. For debugging specific issues, you might reduce the sampling rate to reduce overhead.
The "Task Tracing" view in the dashboard provides even more granular detail. Clicking on a specific task in the timeline expands it to show internal events like object fetches, actor method calls, and Python function calls within that task. This is crucial for identifying why a task took long, not just that it took long.
One critical aspect of profiling Ray applications is understanding the difference between the scheduling time and the actual execution time. The timeline shows the latter. However, a task might be scheduled very quickly, but then sit in a queue waiting for a resource (CPU, GPU, or even an object it depends on). The "waiting time" is often invisible unless you inspect the internal events within a task’s trace. For instance, a task might show up as starting at time T in the timeline, but its internal trace might reveal it was assigned to a worker at T-0.5s and spent that half-second waiting for a required object to be deserialized.
The next concept you’ll want to dive into is actor profiling, which extends these timeline and tracing concepts to the stateful components of your Ray application.