ASHA is the key to making hyperparameter tuning actually useful by aggressively killing off underperforming trials so your resources focus on the promising ones.
Let’s see ASHA in action. Imagine we’re tuning a simple neural network with num_layers and learning_rate.
import ray
from ray import tune
from ray.tune.schedulers import ASHAScheduler
def train_fn(config):
for step in range(config["epochs"]):
# Simulate training progress
accuracy = (step / config["epochs"]) * config["learning_rate"] * 10
# Report intermediate results
tune.report(accuracy=accuracy, step=step)
if __name__ == "__main__":
ray.init(ignore_reinit_error=True)
analysis = tune.run(
train_fn,
config={
"epochs": 100,
"learning_rate": tune.grid_search([0.001, 0.01, 0.1]),
"num_layers": tune.grid_search([1, 2, 3]),
},
num_samples=20, # We'll let ASHA prune this down
scheduler=ASHAScheduler(
metric="accuracy",
mode="max",
max_t=100, # Maximum number of training steps
grace_period=10, # Minimum steps before pruning
reduction_factor=2,
),
verbose=1
)
print("Best config: ", analysis.get_best_config(metric="accuracy", mode="max"))
ray.shutdown()
When you run this, you’ll see a lot of trials start, but many will quickly be marked as TERMINATED or PAUSED. ASHA is doing its job, identifying that trials with learning_rate=0.001 and num_layers=1, for instance, are unlikely to catch up to those with better initial performance. The grace_period ensures we don’t prune too early, while reduction_factor=2 means we’ll be looking at roughly half the trials in each successive evaluation.
The core problem ASHA solves is the vast inefficiency of traditional grid or random search where many trials are run to completion, even if they are clearly suboptimal. This wastes compute cycles on configurations that will never win. ASHA, by contrast, is a population-based algorithm. It maintains a population of ongoing trials and periodically evaluates their performance. Trials that are performing poorly relative to others are terminated early. This early termination is the key: it frees up resources to start new trials with potentially better configurations.
The grace_period is crucial. It’s the minimum number of steps a trial must run before it’s eligible for pruning. This prevents prematurely killing off a trial that might just be slow to start but has good long-term potential. max_t is the absolute maximum number of steps any trial will run, acting as a safeguard. The reduction_factor dictates how aggressively ASHA prunes. A reduction_factor of 2 means that in each successive "rung" of the hyperparameter search, ASHA aims to keep approximately half of the currently running trials. It repeatedly samples trials, promotes the best, and stops the worst.
The metric and mode arguments to ASHAScheduler tell ASHA what to optimize and whether to maximize or minimize it. In our example, metric="accuracy" and mode="max" mean ASHA is looking for the highest accuracy. max_t is set to the total number of epochs in our train_fn.
ASHA operates in asynchronous fashion. When a trial reports intermediate results, ASHA checks its performance against other trials at a similar stage of execution. If a trial falls too far behind, it’s stopped. The scheduler then uses the freed resources to start new trials or resume paused trials. This dynamic allocation is what makes it efficient.
The common misconception is that ASHA is just a fancy way to stop trials. It’s more than that; it’s a sophisticated resource allocation strategy. It doesn’t just stop bad trials; it actively reallocates the compute that would have been spent on them. This reallocation is what allows ASHA to explore the hyperparameter space much more broadly and deeply within the same budget as a less aggressive search strategy.
If you use tune.run with num_samples much larger than your available resources, ASHA will manage the queue. Trials that are paused will be resumed when resources become available, and new trials will be started as old ones are terminated, all governed by ASHA’s pruning logic.
The next step after mastering ASHA is often exploring more advanced multi-objective optimization or integrating ASHA with other search algorithms like Hyperopt or Optuna for even more sophisticated tuning.