Deploying and using OpenAI models on Azure is a lot like getting a private chef for your AI needs, but with the added benefit of enterprise-grade security and scalability.

Let’s see it in action. Imagine you want to use GPT-3.5 Turbo for a customer service chatbot. On Azure, this isn’t about firing up a local Python script and hoping for the best. It’s a managed service.

First, you’ll need an Azure subscription and then access to Azure OpenAI. You request access, and once approved, you can create a resource in the Azure portal.

# Example of creating an Azure OpenAI resource via Azure CLI
az cognitiveservices account create \
  --name my-openai-resource \
  --resource-group my-resource-group \
  --kind OpenAI \
  --sku s0 \
  --location eastus

This s0 SKU is a standard offering, and eastus is just one of many regions you can choose.

Once the resource is provisioned, you navigate to it in the Azure portal. Here, you’ll find your endpoint and API keys – these are your credentials to the AI.

The next step is deploying a specific model. You go to the "Model deployments" section and choose a base model like gpt-35-turbo.

// Example of a deployment configuration
{
  "sku": {
    "name": "Standard"
  },
  "model": {
    "format": "OpenAI",
    "name": "gpt-35-turbo",
    "version": "0613"
  }
}

This 0613 is a specific version of gpt-35-turbo. Azure manages the underlying infrastructure, ensuring that when you send a request, it gets processed efficiently and reliably.

Now, to use it, you’ll typically use an API client. Here’s a Python snippet using the openai library, configured for Azure:

import openai

# Configure Azure OpenAI
openai.api_type = "azure"
openai.api_base = "https://my-openai-resource.openai.azure.com/" # Your resource endpoint
openai.api_version = "2023-05-15" # The API version you deployed against
openai.api_key = "YOUR_AZURE_OPENAI_API_KEY" # Your resource API key

# Make a completion request
response = openai.ChatCompletion.create(
    engine="my-gpt35-turbo-deployment", # The name you gave your deployment
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(response.choices[0].message.content)

The engine parameter here refers to the deployment name you set in Azure, not the base model name. This distinction is crucial because you can have multiple deployments of the same base model, perhaps with different configurations or for different purposes, all under one Azure OpenAI resource. The api_version is also important; it dictates which features and behaviors are available.

The problem Azure OpenAI solves is making powerful, cutting-edge AI models accessible within a secure, compliant, and scalable cloud environment. Instead of managing your own GPU clusters, model hosting, and scaling logic, Azure handles it. You get the raw power of OpenAI models, but with the management, networking, and security features of Azure. This means you can integrate these models into your existing Azure workflows, leverage Azure Active Directory for access control, and benefit from Azure’s robust monitoring and logging.

The most surprising thing most people don’t realize is how granularly you can control model versions and even have multiple "deployments" of the same model name within a single Azure OpenAI resource. This allows for A/B testing of different model versions (e.g., gpt-35-turbo 0613 vs. gpt-35-turbo 1106) or running distinct applications that each have their own dedicated, isolated endpoint for a model, all without incurring extra Azure OpenAI resource costs. You’re essentially creating named, versioned, and isolated instances of a model, each with its own throughput settings and access patterns, managed by Azure.

Understanding the interplay between Azure OpenAI resources, model deployments, and API versions is key to unlocking its full potential.

Want structured learning?

Take the full Openai-api course →