The most surprising thing about deploying LLMs in air-gapped environments is how little the core LLM technology changes; it’s the delivery mechanism that becomes the intricate puzzle.

Imagine you’ve got a secure, offline network – no internet access, period. You want to run models like Llama 3, Mistral, or Phi-3 on it. Ollama is your go-to tool for running these locally, but how do you get the models and Ollama itself onto that air-gapped system?

Here’s how you’d typically set up Ollama for an air-gapped deployment:

1. Prepare Your "Internet-Connected" Machine

This is where you’ll download everything needed.

  • Install Ollama: On a machine with internet access, download the Ollama installer from ollama.com/download. Run it.

  • Download Models: This is the crucial step. You need to pull the model files. Let’s say you want Llama 3 8B.

    ollama pull llama3
    

    This command downloads the model weights and configuration. Ollama stores these model files in a specific directory. On Linux, this is typically ~/.ollama/models. On macOS, it’s ~/Library/Application Support/Ollama/models. On Windows, it’s %USERPROFILE%\.ollama\models.

    You can list your downloaded models and their sizes:

    ollama list
    

    This will show you something like:

    NAME              ID              SIZE    MODIFIED
    llama3:8b         <hash>          4.7 GB  2 days ago
    mistral:7b        <hash>          4.1 GB  3 days ago
    phi3:mini         <hash>          2.7 GB  1 day ago
    

    Each model is a directory containing several files, including config.json, params.json, and the actual weight files (e.g., layers.0.weight.bin, layers.1.weight.bin, etc.).

2. Transfer to the Air-Gapped Network

  • Copy Ollama Application: The Ollama executable itself needs to be transferred.

    • Linux: Find the ollama binary (often in /usr/local/bin/ollama or /usr/bin/ollama). Copy this single file.
    • macOS: The application bundle is usually in /Applications/Ollama.app. You can copy this entire directory.
    • Windows: Locate the ollama.exe file.
  • Copy Model Files: You need to copy the entire ~/.ollama/models directory (or its equivalent on your OS) from your internet-connected machine to the air-gapped machine. Ensure the directory structure is preserved. For example, if llama3:8b was at ~/.ollama/models/llama3/8b/, it must be copied to the equivalent location on the air-gapped machine.

3. Configure and Run on the Air-Gapped Machine

  • Place Ollama Executable: Put the copied ollama binary (or the Ollama.app directory, or ollama.exe) into a known location on the air-gapped machine. For Linux, placing it in /usr/local/bin is standard.

  • Place Model Files: Ensure the copied models directory is in the expected location relative to where Ollama will be run, or configure Ollama to look elsewhere.

  • Set Environment Variable (Optional but Recommended): To ensure Ollama looks for models in your specific copied directory, you can set the OLLAMA_MODELS environment variable. On Linux/macOS:

    export OLLAMA_MODELS=/path/to/your/copied/models
    

    On Windows (Command Prompt):

    set OLLAMA_MODELS=C:\path\to\your\copied\models
    

    On Windows (PowerShell):

    $env:OLLAMA_MODELS="C:\path\to\your\copied\models"
    

    You would typically add this to your shell’s profile file (e.g., .bashrc, .zshrc on Linux/macOS, or a startup script on Windows) so it’s set automatically.

  • Start Ollama: Run the Ollama server. On Linux/macOS:

    ./ollama serve
    

    Or, if installed in a standard path:

    ollama serve
    

    On Windows:

    ollama.exe serve
    

    Ollama will start and automatically detect the models in the specified OLLAMA_MODELS directory.

  • Run a Model: You can now interact with the models.

    ollama run llama3
    

    This command will load the llama3 model from your local files and present you with an interactive prompt.

What makes this work?

Ollama is designed to be a self-contained application. When you ollama pull, it downloads the model architecture definitions and the actual trained weights. When you ollama serve, it reads these local files, loads them into memory (or onto the GPU if available), and exposes an API endpoint (defaulting to http://127.0.0.1:11434) for interaction. By transferring the Ollama binary and the entire model data directory, you’re essentially transplanting the entire execution environment. The OLLAMA_MODELS environment variable is key because it tells the Ollama server where to find those model files, bypassing any default locations that might not exist or be accessible in the air-gapped setup.

The critical insight is that the model files are just data. As long as the Ollama server can access that data and has the necessary libraries (which are bundled with the Ollama executable for most common platforms), it doesn’t need external network access to run the models.

The next hurdle you’ll likely encounter is managing model updates or introducing new models into the air-gapped environment, which requires repeating this transfer process.

Want structured learning?

Take the full Ollama course →