Pinecone’s backup and restore functionality lets you move your vector data out of and back into a Pinecone index, providing a safety net against accidental deletions or data corruption, and enabling disaster recovery scenarios.

Let’s see it in action. Imagine you have an index named my-embeddings and you want to back it up. You’d use the pinecone.Index('my-embeddings').export_index() method. This operation, when initiated, doesn’t immediately give you a file. Instead, it triggers an asynchronous process within Pinecone’s infrastructure. You’ll receive an ExportResponse object, which contains a status field. You’ll need to periodically poll this status using the pinecone.describe_index_stats() or a dedicated export status check function (if available in the SDK) until the status shows as COMPLETED. Once completed, Pinecone will provide you with a pre-signed URL to download the exported data, typically in a compressed JSON Lines format.

Here’s a simplified Python snippet illustrating the export initiation:

import pinecone
import time

# Initialize Pinecone (replace with your API key and environment)
pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")

index_name = "my-embeddings"
export_path = "s3://your-s3-bucket/pinecone-backups/my-embeddings-backup-$(date +%Y-%m-%d-%H-%M-%S)" # Example S3 path

try:
    # Initiate the export
    export_response = pinecone.Index(index_name).export_index(
        host=f"{index_name}.YOUR_ENVIRONMENT.pinecone.io", # The index's host URL
        storage_path=export_path,
        sync=False # Set to True if you want to block until completion (not recommended for large indexes)
    )

    print(f"Export initiated. Export ID: {export_response.export_id}")

    # Poll for status if sync=False
    if not export_response.sync:
        while True:
            export_status = pinecone.describe_index_stats(index_name=index_name) # This might need a specific export status API call depending on SDK version
            # In a real scenario, you'd poll a dedicated export status endpoint.
            # For demonstration, we'll use a placeholder. Let's assume `export_status.status` holds the relevant info.
            # If you don't see a direct `export_status` field, you'd need to query the export job ID.
            print(f"Current export status: Checking...")
            # Replace this with actual status check logic using export_response.export_id
            time.sleep(60) # Wait for 60 seconds before polling again
            # Example of how you might check status if a dedicated endpoint existed:
            # status_check = pinecone.describe_export_status(export_id=export_response.export_id)
            # if status_check.status == "COMPLETED":
            #     print("Export completed successfully!")
            #     print(f"Download URL: {status_check.download_url}")
            #     break
            # elif status_check.status == "FAILED":
            #     print("Export failed.")
            #     break
            # else:
            #     print(f"Export status: {status_check.status}. Waiting...")

except Exception as e:
    print(f"An error occurred: {e}")

The storage_path is crucial. It must be an S3 bucket URI (e.g., s3://your-bucket-name/path/to/export). Pinecone writes the data to this location. You’ll need to ensure your Pinecone environment has the necessary IAM permissions to write to this S3 bucket. The export process compresses the data, typically into a .gz file, and organizes it into a directory structure within your specified S3 path. Each vector record, along with its ID and metadata, is represented as a JSON object on a new line within the file.

Restoring is the reverse. You’ll point to the S3 location where your backup resides and use pinecone.Index('new-index-name').upsert_from_file(). This method takes the file path (the S3 URI) and an index name. Pinecone then reads the data from the S3 bucket and upserts it into the specified index. If the index doesn’t exist, it will be created. If it does exist, the data will be merged with existing vectors.

The core problem this solves is data durability and portability. Without this, your vector data lives solely within Pinecone’s managed service. If your account were compromised, or if you needed to migrate to a different vector database, you’d be stuck. The export/import mechanism provides that escape hatch and safety net.

Internally, the export process serializes your index’s data into a format that can be efficiently stored and retrieved from S3. This involves iterating through all vectors, their IDs, and their associated metadata, then compressing this information. The restore process reads this serialized data, parses it, and then uses Pinecone’s internal upsert API to re-ingest it. The upsert_from_file convenience function abstracts away the need to manually read the file, parse JSON lines, and batch upserts, which can be complex for large datasets.

The most surprising aspect for many is that the export process requires you to provide an S3 bucket you own and manage. Pinecone doesn’t store the backup data itself; it uses your designated S3 location as its staging ground for the exported data. This gives you direct control over where your data goes, but also means you’re responsible for the S3 costs and permissions.

The next hurdle you’ll encounter is understanding the exact format of the exported data. While it’s JSON Lines, the structure of each JSON object, especially concerning metadata and sparse/dense vector representations, requires careful examination to ensure compatibility if you were to manually process or migrate it.

Want structured learning?

Take the full Pinecone course →