Python Zero-Downtime Deployment: Rolling Restart Patterns (2026)

Rolling restarts are the secret sauce behind zero-downtime deployments for Python applications, but they’re not magic. They’re a carefully orchestrated dance where new versions of your service gradually replace old ones, ensuring that at no point is the entire system unavailable. The core idea is to keep a small number of old instances running while you bring up new ones, then slowly drain traffic from the old instances as the new ones become healthy.

Let’s see this in action with a simple Flask app.

# app.py
from flask import Flask
import time
import os

app = Flask(__name__)

@app.route('/')
def hello():
    # Simulate some work
    time.sleep(0.1)
    return f"Hello from {os.environ.get('HOSTNAME', 'unknown')}! Version: 1.0"

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Imagine we have this running on two servers, server-1 and server-2, behind a load balancer.

Initial State:

server-1 (running app.py v1.0)
server-2 (running app.py v1.0)
Load balancer directs traffic to both.

Deployment Trigger: Update to v1.1

Bring up a new instance of v1.1: We start a new process on server-1 running app.py v1.1.
- server-1 now has two processes: v1.0 and v1.1.
- The load balancer is not yet sending traffic to the new v1.1 instance.
Health Check: The new v1.1 instance starts and passes its health checks. This is crucial. If it fails, we don’t proceed.
Gradual Traffic Shift: The load balancer is configured to start sending a small percentage of traffic (e.g., 10%) to the new v1.1 instance on server-1.
- Requests to server-1 might now hit either v1.0 or v1.1.
- Requests to server-2 still exclusively hit v1.0.
Monitor and Increase: We watch logs and metrics. If v1.1 is stable, we gradually increase the traffic percentage to it (e.g., 25%, 50%, 75%). Simultaneously, we might start draining traffic from the v1.0 instance on server-1.
Remove Old Instance: Once 100% of traffic on server-1 is going to v1.1, we can safely shut down the v1.0 instance on server-1.
- Now, server-1 is running only v1.1.
- server-2 is still running v1.0.
Repeat for Other Nodes: We repeat the entire process for server-2.
- Start v1.1 on server-2.
- Health check.
- Gradually shift traffic from v1.0 to v1.1 on server-2.
- Once server-2 is fully on v1.1, shut down the old v1.0 instance.

Final State:

server-1 (running app.py v1.1)
server-2 (running app.py v1.1)
Load balancer directs traffic to both v1.1 instances.

The key is that at no point were all instances running the old version simultaneously unavailable. There was always at least one instance serving requests, and the new instances were brought online and validated before the old ones were decommissioned.

This pattern is implemented by orchestrators like Kubernetes (using Deployment objects and RollingUpdate strategy), or managed services. The load balancer is a critical piece, needing to support health checks and gradual traffic shifting.

The mental model here hinges on state management and request lifecycle. If your application has long-running requests or maintains sticky sessions, you need to ensure these are handled gracefully during the transition. A request that starts on an old instance must complete on that instance, even if the instance is marked for draining. This is why graceful shutdown is so important – the application needs to signal to the load balancer, "I’m done accepting new requests, but I’ll finish the ones I’ve got."

What most people miss is how application state interacts with this. If your Python app writes to a shared database, and v1.0 writes data in a format that v1.1 can’t read (or vice-versa), you’ve got a problem. This is why rolling deployments often require "backward-compatible" changes initially. You might deploy v1.1 that can both read v1.0’s format and write v1.1’s format, and then in a separate deployment, deploy v1.2 which only writes v1.1’s format and only reads v1.1’s format. This is known as the "expand/contract" pattern for database schema changes.

The next hurdle is dealing with stateful applications or services that require immediate consistency across all instances.