Models and Fine-tuning
This guide covers the available models in Ollama, how to use them, and techniques for customizing models to suit your specific requirements.
Available Models
Ollama supports a variety of open-source LLMs. Here are some of the most commonly used models:
General-Purpose Models
| Model | Size | Description | Command |
|---|---|---|---|
| Llama 2 | 7B to 70B | Meta’s general-purpose model | ollama pull llama2 |
| Mistral | 7B | High-quality open-source model | ollama pull mistral |
| Mixtral | 8x7B | Mixture-of-experts model | ollama pull mixtral |
| Phi-2 | 2.7B | Microsoft’s compact model | ollama pull phi |
| Neural Chat | 7B | Optimized for chat | ollama pull neural-chat |
| Vicuna | 7B to 33B | Fine-tuned LLaMa model | ollama pull vicuna |
Code-Specialized Models
| Model | Size | Description | Command |
|---|---|---|---|
| CodeLlama | 7B to 34B | Code-focused Llama variant | ollama pull codellama |
| WizardCoder | 7B to 34B | Fine-tuned for code tasks | ollama pull wizardcoder |
| DeepSeek Coder | 6.7B to 33B | Specialized for code | ollama pull deepseek-coder |
Small/Efficient Models
| Model | Size | Description | Command |
|---|---|---|---|
| TinyLlama | 1.1B | Compact model for limited resources | ollama pull tinyllama |
| Gemma | 2B to 7B | Google’s lightweight model | ollama pull gemma |
| Phi-2 | 2.7B | Efficient and compact | ollama pull phi |
Multilingual Models
| Model | Description | Command |
|---|---|---|
| BLOOM | Multilingual capabilities | ollama pull bloom |
| Qwen | Chinese and English | ollama pull qwen |
| Japanese Stable LM | Japanese language | ollama pull stablej |
Model Management
Listing Models
# List all downloaded models
ollama list
Pulling Models
# Pull a specific model version
ollama pull mistral:7b-v0.1
Removing Models
# Remove a model
ollama rm mistral
Model Parameters
Control model behavior with these parameters:
| Parameter | Description | Range |
|---|---|---|
temperature |
Controls randomness | 0.0 - 2.0 |
top_p |
Nucleus sampling threshold | 0.0 - 1.0 |
top_k |
Limits vocabulary to top K tokens | 1 - 100+ |
context_length |
Maximum context window size | Model dependent |
seed |
Random seed for reproducibility | Any integer |
Example usage:
# Run a model with specific parameters
ollama run mistral --temperature 0.7 --top_p 0.9
Customizing Models with Modelfiles
Ollama uses Modelfiles (similar to Dockerfiles) to create custom model configurations.
Basic Modelfile Example
FROM mistral:latest
PARAMETER temperature 0.7
SYSTEM You are an expert DevOps engineer specializing in cloud infrastructure.
Save this in a file named Modelfile and create a custom model:
ollama create devops-assistant -f ./Modelfile
ollama run devops-assistant
Advanced Modelfile Example
FROM codellama:latest
PARAMETER temperature 0.3
PARAMETER top_p 0.8
PARAMETER stop "```"
TEMPLATE """
<system>
You are a senior software developer specialized in infrastructure as code, container orchestration, and CI/CD pipelines.
</system>
<user>
{{.Prompt}}
</user>
<assistant>
"""
Modelfile Commands Reference
| Command | Description | Example |
|---|---|---|
FROM |
Base model | FROM mistral:latest |
PARAMETER |
Set inference parameter | PARAMETER temperature 0.7 |
SYSTEM |
Set system message | SYSTEM You are a helpful assistant |
TEMPLATE |
Define chat template | TEMPLATE <s>{{.System}}</s>{{.Prompt}} |
ADAPTER |
Apply LoRA adapter | ADAPTER ./adapter.bin |
MESSAGE |
Add example conversation | MESSAGE user "Hi" |
PROMPT |
Default prompt | PROMPT Answer in bullet points |
LICENSE |
Specify model license | LICENSE MIT |
Fine-tuning with Custom Data
While Ollama doesn’t directly support fine-tuning, you can use pre-fine-tuned models and adapt them with Modelfiles.
Using External Fine-tuned Models
- Fine-tune a model using an external tool like LLaMA Factory
- Convert the model to GGUF format
- Import into Ollama:
# Import a fine-tuned model
ollama create custom-model -f Modelfile
With Modelfile:
FROM ./path/to/fine-tuned-model.gguf
PARAMETER temperature 0.7
Behavior Fine-tuning with Examples
You can “soft fine-tune” model behavior by providing examples in the Modelfile:
FROM mistral:latest
SYSTEM You are a DevOps automation expert who explains complex concepts clearly.
# Include examples of desired behavior
MESSAGE user "How can I automate AWS infrastructure provisioning?"
MESSAGE assistant "For AWS infrastructure automation, I recommend these approaches:
1. **Terraform**: Infrastructure as Code (IaC) tool with declarative configuration
- Maintains state of resources
- Supports modules for reusability
- Works with multiple cloud providers
2. **AWS CloudFormation**: Native AWS IaC service
- JSON/YAML templates
- Stack management with change sets
- Deep integration with AWS services
3. **AWS CDK**: Define infrastructure using familiar programming languages
- TypeScript, Python, Java support
- Compiles to CloudFormation
- Object-oriented approach
For most teams, Terraform offers the best balance of flexibility and functionality."
Model Quantization
Ollama supports various quantization levels to balance performance and resource usage:
| Quantization | Size Impact | Performance | Command |
|---|---|---|---|
| Q4_K_M | 3-4GB (7B model) | Good | ollama pull mistral:7b-q4_k_m |
| Q5_K_M | 4-5GB (7B model) | Better | ollama pull mistral:7b-q5_k_m |
| Q8_0 | 7-8GB (7B model) | Best | ollama pull mistral:7b-q8_0 |
For resource-constrained environments, use more aggressive quantization:
# Pull a highly quantized model
ollama pull tinyllama:1.1b-q4_0
RAG (Retrieval-Augmented Generation)
Enhance models with external knowledge using RAG:
#!/bin/bash
# Simple RAG implementation with Ollama
MODEL="mistral:latest"
QUERY="What are the key components of a Kubernetes cluster?"
CONTEXT_FILE="kubernetes-docs.txt"
# Get context from a document
CONTEXT=$(grep -i "kubernetes components\|control plane\|node components" "$CONTEXT_FILE" | head -n 15)
# Create prompt with context
PROMPT="Based on the following information:\n\n$CONTEXT\n\nPlease answer: $QUERY"
# Send to Ollama
ollama run $MODEL --prompt "$PROMPT"
Practical Model Selection Guide
| Use Case | Recommended Model | Why |
|---|---|---|
| General chat | mistral:7b |
Good balance of size and capability |
| Code assistance | codellama:7b |
Specialized for code understanding/generation |
| Resource-constrained | tinyllama:1.1b |
Small memory footprint |
| Technical documentation | neural-chat:7b |
Clear instruction following |
| Complex reasoning | mixtral:8x7b or llama2:70b |
Sophisticated reasoning capabilities |
DevOps-Specific Model Configuration
For DevOps-specific tasks, create a specialized model configuration:
# DevOps Assistant Modelfile
FROM codellama:latest
PARAMETER temperature 0.3
PARAMETER top_p 0.8
SYSTEM You are an expert in DevOps practices, cloud infrastructure, CI/CD pipelines, and infrastructure as code. You provide concise, accurate answers with practical examples when appropriate. You're familiar with AWS, Azure, GCP, Kubernetes, Docker, Terraform, Ansible, GitHub Actions, and other DevOps tools.
# Example prompt for debugging
PROMPT """
I'm encountering the following issue with my CI/CD pipeline or infrastructure:
{{.Input}}
Please help me by:
1. Identifying potential causes
2. Suggesting troubleshooting steps
3. Recommending a solution
4. Providing a brief example if applicable
"""
Create this model:
ollama create devops-assistant -f ./DevOps-Modelfile
Next Steps
Now that you understand Ollama’s models:
- Configure GPU acceleration to speed up model inferencing
- Set up Open WebUI for a graphical interface
- Explore DevOps usage examples for practical applications