You can build truly custom AI models with Ollama by using Modelfiles, and the most powerful feature is their templating system.

Here’s a basic Modelfile that uses a system prompt and a template to guide a model’s behavior:

FROM ./my-base-model:latest


TEMPLATE """{{- system }}


{{- prompt }}

"""

PARAMETER temperature 0.7
PARAMETER top_k 40
PARAMETER stop "User:"
PARAMETER stop "Assistant:"

When you run ollama run my-custom-model:latest, Ollama will take the FROM instruction, load that model, then apply the TEMPLATE and PARAMETER instructions. The TEMPLATE is a Go template string. {{- system }} and {{- prompt }} are special variables that Ollama injects. {{- system }} will be replaced by the content of the SYSTEM instruction in your Modelfile, and {{- prompt }} will be replaced by the user’s input. The - in {{- system }} and {{- prompt }} tells the Go templating engine to trim whitespace around the injected content.

Let’s see this in action. Imagine you have a base model, say llama3, and you want to create a persona for it.

First, create a Modelfile named Modelfile.persona:

FROM llama3

SYSTEM """You are a helpful, respectful, and honest assistant. Always answer in JSON format.
"""


TEMPLATE """{{- system }}


User: {{.Prompt}}

Assistant:
"""

PARAMETER stop "User:"
PARAMETER stop "Assistant:"

Now, build this model:

ollama create persona:latest -f ./Modelfile.persona

And run it:

ollama run persona:latest

When you type:

Tell me about large language models.

The actual prompt sent to the llama3 model will look something like this (whitespace trimmed):

You are a helpful, respectful, and honest assistant. Always answer in JSON format.
User: Tell me about large language models.
Assistant:

The model will then respond, and importantly, it will try to adhere to the JSON format instruction.

This templating is more than just a basic string replacement. You can use standard Go template logic. For instance, to conditionally include parts of the prompt:

FROM llama3

SYSTEM """You are a chatbot that can translate text.
"""


TEMPLATE """{{- system }}


{{ if .History }}Conversation History:


{{ range .History }}User: {{ .Prompt }}


Assistant: {{ .Response }}


{{ end }}{{ end }}User: {{ .Prompt }}

Assistant:
"""

PARAMETER stop "User:"
PARAMETER stop "Assistant:"

This Modelfile includes a History variable. If History is present (meaning there’s a conversation ongoing), it will format the past turns before adding the current User prompt. This is how Ollama handles conversational context within its templating. The range .History iterates over an array of past Prompt/Response pairs.

The real power comes when you realize that the TEMPLATE instruction becomes the only way the model sees user input and system instructions after it’s built. Ollama doesn’t just append your input to a fixed string; it dynamically constructs the entire prompt based on your template. This means you can orchestrate complex interactions, inject specific formatting rules, or even create multi-turn few-shot examples directly within the template.

Consider a Modelfile that uses context from a file:

FROM llama3

SYSTEM """You are an expert coder that explains code snippets.
"""

FILE context.txt
"""
Here is some background information about Python:
Python is a high-level, interpreted, general-purpose programming language.
"""


TEMPLATE """{{- system }}


{{ .Files.context_txt }}



User: {{ .Prompt }}

Assistant:
"""

PARAMETER stop "User:"
PARAMETER stop "Assistant:"

Here, FILE context.txt embeds the content of context.txt into the Modelfile, and it becomes available in the template as .Files.context_txt. This allows you to precondition the model with significant amounts of static information.

The PARAMETER instructions are also crucial. stop "User:" and stop "Assistant:" are common because they tell the model to stop generating text when it encounters these strings, preventing it from generating the next turn of the conversation itself. You can add multiple PARAMETER stop lines.

The TEMPLATE instruction is evaluated on every single API call. This means that even if your Modelfile is static, the prompt sent to the underlying model can change based on the provided user input and any dynamic context Ollama might inject (like conversation history, though this is managed by the client application interacting with Ollama, not directly by the Modelfile’s template itself unless you’re building a custom client).

What most users miss is that the TEMPLATE is not just for adding a system prompt. It’s the entire structure of the interaction. You can create elaborate prompt chains, inject variables from your application into the template at runtime (if you’re using the API directly), and control exactly how the model perceives the input. For example, you could construct a template that always asks the model to output its reasoning steps before the final answer, by structuring the template to ask for "Reasoning:" and then "Answer:".

The next step after mastering templating is understanding how to manage model layers and versioning using Modelfiles, allowing you to build complex, multi-stage model fine-tuning pipelines.

Want structured learning?

Take the full Ollama course →