Ollama’s model management is surprisingly flexible, letting you treat large language models like simple packages on your local machine.

Let’s see this in action. Imagine you’ve just installed Ollama and want to try out Llama 3. You’d open your terminal and type:

ollama pull llama3

This command downloads the llama3 model. Ollama handles everything: finding the right version, downloading the weights (which can be gigabytes), and making it available for use. Once it’s done, you can immediately start a chat session:

ollama run llama3

You’ll see a prompt, and you can start typing your questions. It feels almost like using a command-line tool, but with a powerful AI at your fingertips.

The Core Idea: Local Model as a Package

The fundamental concept here is that Ollama abstracts away the complexity of downloading, storing, and loading massive LLM files. Instead of managing .gguf files manually, you interact with models through simple, intuitive commands. This makes it incredibly easy to experiment with different models without getting bogged down in file paths or version compatibility issues.

Internally, Ollama maintains a registry of downloaded models. When you pull a model, it checks its registry. If the model isn’t there, it fetches it from its central repository. The downloaded model weights are stored in a specific directory on your system (typically ~/.ollama/models). Each model is identified by a name and optionally a tag (like llama3:8b or llama3:70b).

Managing Your Model Library

Beyond pulling and running, you’ll want to keep your model library tidy.

Listing Models: To see what models you have downloaded, use the list command:

ollama list

This will output something like:

NAME            ID              SIZE    MODIFIED
llama3:8b       abcdef123456    4.7 GB  2 hours ago
mistral:7b      ghijkl789012    4.1 GB  3 days ago
phi3:mini       mnopqr345678    1.8 GB  1 day ago

This shows you the model name (including its tag if specified), a unique ID, its disk size, and when it was last updated or downloaded.

Deleting Models: As you experiment, you’ll accumulate models. To free up disk space, you can delete them. For example, to remove the phi3:mini model:

ollama rm phi3:mini

Ollama will prompt you for confirmation before deleting the model files. This is a crucial step for managing your storage, as these models can take up a significant amount of space.

The "Tag" is Key

When you pull or remove a model, you often use a tag. If you don’t specify a tag, Ollama defaults to :latest. So, ollama pull llama3 is the same as ollama pull llama3:latest. This is important because different versions of a model can exist with different performance characteristics and sizes. For instance, you might have llama3:8b and llama3:70b downloaded simultaneously. Using the tag ensures you’re managing the specific model version you intend to.

The model registry within Ollama is more than just a list of files; it’s a structured database that maps these human-readable names and tags to the actual model data stored on disk. This allows Ollama to efficiently load the correct model weights when you invoke ollama run or ollama serve.

When you delete a model using ollama rm, Ollama doesn’t just remove the files; it also updates its internal registry to reflect that the model is no longer available. This prevents errors if you try to run a model that has been deleted.

Consider the implications of using :latest. While convenient, it means that if a new, larger, or subtly different version of llama3 is released and tagged as :latest, a simple ollama pull llama3 (without a specific tag) could replace your existing llama3 model without you explicitly requesting that specific new version. For reproducible workflows, it’s often better to specify the exact tag, e.g., ollama pull llama3:8b-instruct-v1.1.

This model management system is the backbone of using Ollama effectively. It allows for a seamless workflow from discovery and download to execution and cleanup, making local LLM deployment accessible.

The next step after mastering model management is understanding how to use these models in more complex application architectures, particularly through Ollama’s API.

Want structured learning?

Take the full Ollama course →