Serverless indexes in Pinecone don’t actually "run" in the traditional sense; they spin up compute on demand only when you query them, which is why they can be so cost-effective for sporadic workloads.
Let’s see this in action. Imagine you have a collection of documents, each represented by a vector. We’ll index these vectors and then query them.
from pinecone import Pinecone, ServerlessSpec
import time
import os
# Initialize Pinecone
api_key = os.environ.get("PINECONE_API_KEY")
environment = os.environ.get("PINECONE_ENVIRONMENT") # e.g., "gcp-starter" for serverless
pc = Pinecone(api_key=api_key)
index_name = "my-migration-index"
dimension = 8 # Example dimension
metric = "cosine" # Example metric
# --- Create a Serverless Index ---
# If the index doesn't exist, create it.
# For serverless, you specify the cloud and region.
if index_name not in pc.list_indexes().names:
print(f"Creating index '{index_name}'...")
pc.create_index(
name=index_name,
dimension=dimension,
metric=metric,
spec=ServerlessSpec(cloud="aws", region="us-east-1") # Example: AWS us-east-1
)
# Wait for the index to be ready
while not pc.describe_index(index_name).status['ready']:
time.sleep(1)
print(f"Index '{index_name}' created successfully.")
else:
print(f"Index '{index_name}' already exists.")
# Connect to the index
index = pc.Index(index_name)
# --- Upsert some data ---
print("Upserting data...")
vectors_to_upsert = [
("vec1", [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]),
("vec2", [0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1]),
("vec3", [0.2, 0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]),
]
index.upsert(vectors=vectors_to_upsert)
print("Data upserted.")
# Give it a moment for data to be available for querying
time.sleep(5)
# --- Query the index ---
print("Querying the index...")
query_vector = [0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85]
results = index.query(vector=query_vector, top_k=2, include_values=True)
print("Query results:")
for match in results.matches:
print(f" ID: {match.id}, Score: {match.score}, Values: {match.values}")
# --- Clean up (optional) ---
# print(f"Deleting index '{index_name}'...")
# pc.delete_index(index_name)
# print("Index deleted.")
This code demonstrates creating a serverless index, adding a few vectors, and then performing a query. Notice how the ServerlessSpec is used, specifying the cloud provider and region. When you run index.query(), Pinecone provisions the necessary compute resources for that specific request and tears them down afterward. This contrasts with pod-based indexes, where you provision and pay for a fixed set of pods 24/7, regardless of usage.
The core problem serverless indexes solve is optimizing cost and operational overhead for workloads with variable or low-intensity query patterns. Pod-based indexes require you to provision a certain number of pods (e.g., s1.x1, p1.x1) and pay for them continuously. This means you might be overpaying if your index is idle for large periods, or you might experience performance degradation if your traffic spikes beyond your provisioned capacity. Serverless indexes abstract away the provisioning entirely. You define the index’s characteristics (dimension, metric, cloud, region), and Pinecone handles the underlying infrastructure. When a query comes in, Pinecone automatically scales up the compute to handle it, and scales down when idle.
The main levers you control are the cloud and region within ServerlessSpec. These choices are critical because they determine where your data resides and which Pinecone infrastructure is used. Choosing a region close to your application’s users or data sources will minimize latency. The metric (e.g., cosine, euclidean, dotproduct) dictates how vector similarity is calculated, and dimension must match the embeddings your application generates.
When migrating from pod-based to serverless, you’re essentially moving from a "provisioned capacity" model to an "on-demand" model. The process involves creating a new serverless index with the same dimension and metric as your existing pod-based index. Then, you’ll need to re-upsert all your data into this new serverless index. Finally, you’ll update your application’s configuration to point to the new serverless index endpoint. Pinecone’s migration tooling can assist with some aspects of this, but the fundamental steps of creation, data transfer, and re-pointing remain.
The most surprising thing about serverless indexes is their ability to offer near-zero cost during periods of complete inactivity, even for indexes containing millions of vectors. This isn’t just about scaling down; it’s about de-provisioning compute entirely until the next request arrives, making them exceptionally cost-efficient for many use cases that previously would have been prohibitively expensive with traditional vector database architectures.
After successfully migrating and querying your serverless index, you’ll next want to explore the Pinecone.scale_index() method to understand how to provision dedicated pods for consistently high-throughput workloads if serverless performance becomes a bottleneck.