Ollama’s verbose logging mode doesn’t just give you more output; it fundamentally changes how the system perceives and reports on its own internal state.
Let’s see it in action. First, start Ollama with verbose logging enabled. On Linux/macOS, this means setting an environment variable before running ollama serve:
OLLAMA_DEBUG=1 ollama serve
On Windows, you’d set it in your command prompt or PowerShell:
$env:OLLAMA_DEBUG=1
ollama serve
Now, make a request. For instance, pull a model:
ollama pull llama2
Watch the output. You’ll see a flood of information that looks like this (simplified):
[DEBUG] 2023-10-27T10:00:00Z main.go:250 Starting Ollama server...
[DEBUG] 2023-10-27T10:00:00Z routes.go:45 POST /api/pull
[DEBUG] 2023-10-27T10:00:00Z pull.go:67 Pulling model: llama2
[DEBUG] 2023-10-27T10:00:01Z registry.go:123 Connecting to registry.ollama.ai:443
[DEBUG] 2023-10-27T10:00:02Z registry.go:150 Successfully fetched manifest for llama2
[DEBUG] 2023-10-27T10:00:05Z blob.go:88 Downloading blob sha256:abcdef123...
[DEBUG] 2023-10-27T10:00:15Z blob.go:105 Downloaded blob sha256:abcdef123 (1024MB)
[DEBUG] 2023-10-27T10:00:16Z model.go:90 Creating model: llama2
[DEBUG] 2023-10-27T10:00:17Z model.go:110 Model llama2 created successfully.
[DEBUG] 2023-10-27T10:00:17Z routes.go:50 Sending response: 200 OK
This isn’t just logging; it’s a live trace of Ollama’s internal state machine. When you run ollama serve normally, it operates with a default logging level that prioritizes efficiency. It only reports significant events or errors. OLLAMA_DEBUG=1 flips this, instructing every Go routine and subsystem within Ollama to log its every significant step, decision, and interaction. It’s like turning on a microscopic camera inside the server.
The core problem Ollama solves is making large language models accessible and runnable on consumer hardware. It handles model downloading, quantization, and serving through a simple API. Verbose logging helps diagnose issues within this complex pipeline. For example, if a pull request hangs, you can see exactly which step in the pull.go or registry.go files is stuck. If inference is slow, you can trace the calls through runner.go and decoder.go to pinpoint bottlenecks.
The levers you control are the environment variable OLLAMA_DEBUG (set to 1 for verbose, 0 or unset for normal) and the specific requests you make to the Ollama API. Each API call, whether it’s pull, create, run, or completion, triggers a cascade of internal operations that are now visible.
When you’re debugging a scenario where Ollama seems to be silently failing or behaving unexpectedly, the key is to correlate the timestamped debug logs with the specific API request you made. For instance, if you’re seeing errors related to model loading, you’d look for log entries around the time you issued ollama run <model_name> and specifically examine messages originating from model.go or the underlying storage access functions. The debug output will show you the exact path it’s trying to read the model from, the checksums it’s verifying, and any errors encountered during file I/O or deserialization.
The most surprising thing about Ollama’s debug logging is how granular it is. It doesn’t just tell you "download failed"; it shows you the specific HTTP status code received from the registry server, the exact byte range being requested, and the internal buffer size used for the download. This level of detail is crucial because LLM operations involve many moving parts: network requests, disk I/O, memory allocation, CPU/GPU scheduling, and complex state management within the model inference engine. Without verbose logging, tracing a failure through these layers would be nearly impossible, forcing you to guess which component is misbehaving.
The next concept you’ll likely encounter is how to interpret the model manifest and blob structure when debugging download issues, especially when dealing with custom model creation or registry quirks.