With growing privacy concerns, developers increasingly prefer local, self-hosted AI environments. Cloud-based AI platforms frequently involve third-party data handling and recurring costs. Hosting your own AI models locally solves these issues. You retain complete control over data security, customisation, and performance optimisation.
Ollama simplifies local hosting of large language models (LLMs) without extensive configurations. With Open WebUI, a straightforward user interface, it creates a seamless, private AI experience. Open WebUI directly integrates with Ollama, allowing developers to interact easily with AI models through a clean browser-based interface.
This how-to guide walks you step-by-step through hosting your local AI platform with Ollama and Open WebUI. By the end, you’ll have a fully functional, private, and efficient AI environment operating entirely within your infrastructure.
Are you ready to build your own local AI platform? Let’s begin.
NOTE: I've done this whole process in MacOS. So, few code snippets can be different for you if you're using Windows or Linux
Prerequisites
Before setting up your local AI platform, confirm your Mac meets the necessary requirements.
Hardware Requirements
- Recommended RAM: At least 16 GB (32 GB provides smoother performance with larger models).
- Storage Space: Minimum 50 GB free disk space for models and dependencies.
- Processor: Intel or Apple Silicon (M1 or later) Macs are suitable.
- GPU: Integrated Apple GPUs work effectively; external GPUs are optional but improve performance.
Software Requirements
macOS Version
Check your macOS version to confirm compatibility:
sw_vers
Ensure macOS version is Ventura (13.x) or Sonoma (14.x).
Homebrew
Homebrew simplifies installing software on macOS. Verify installation with:
brew --version
If Homebrew isn’t installed, install it using:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Docker Desktop
Docker is essential for running Open WebUI.
Install Docker Desktop via Homebrew:
brew install --cask docker
After installation, launch Docker Desktop and confirm the installation by running:
docker --version
Ollama
Install Ollama on macOS using Homebrew:
brew install ollama
Check the Ollama installation by running:
ollama --version
Optional:
- Visual Studio Code (or your preferred IDE):
brew install --cask visual-studio-code - iTerm2 (alternative terminal emulator):
brew install --cask iterm2
Final Checks
Ensure Docker Desktop is running and that your terminal correctly recognizes Docker:
docker ps
This command should run without errors.
You’re now set up with the required prerequisites to host your local AI platform using Ollama and Open WebUI on your Mac.
Next, let’s install and configure Ollama.
Installing Ollama
Ollama runs large language models locally with minimal setup. It abstracts away GPU configurations, handles model downloading, and exposes a local API you can interact with. Installing Ollama on macOS is straightforward using Homebrew.
Step 1: Install Ollama via Homebrew
Run the following command:
brew install ollama
This will install the latest version of Ollama on your system. Once installed, verify it:
ollama --version
You should see the installed version printed in the terminal. If you get a “command not found” error, restart your terminal and try again.
Step 2: Start the Ollama Background Service
After installation, Ollama runs as a background service. To confirm it’s running:
ollama serve
This command manually starts the Ollama server. You’ll see logs indicating it’s listening on localhost:11434. You can stop this process with Ctrl + C.
If you prefer to run Ollama as a persistent background service, use:
brew services start ollama
This ensures that Ollama starts automatically with your system and runs in the background continuously.
To stop it at any time:
brew services stop ollama
Step 3: Confirm Server Availability
Run this to check whether the Ollama server is running:
curl http://localhost:11434
You should see:
{"status":"ok"}
This confirms that Ollama is up and ready to handle requests.
Step 4: Run Your First Model
To confirm everything is working, run a sample model. Start with a lightweight model like llama2:
ollama run llama2
Ollama will download the model (first-time only) and launch it. After initialization, you’ll see a terminal prompt where you can type queries.
Want a more powerful model like llama3 or mistral? You can replace the model name accordingly:
ollama run llama3
or
ollama run mistral
You now have Ollama running locally on macOS. In the next section, we’ll install Open WebUI to add a clean interface on top of this setup.
Installing Open WebUI
With Ollama set up, the next step is to add a user interface. Open WebUI is a lightweight frontend that connects directly with Ollama running locally. It provides a clean, browser-based interface to interact with LLMs, manage models, and run queries more efficiently.
Let’s install and configure it using Docker.
Step 1: Verify Docker Installation
Before proceeding, confirm Docker is running:
docker info
If this throws an error, launch Docker Desktop from Applications and wait until it starts.
Step 2: Pull the Open WebUI Docker Image
Download the latest Open WebUI container:
docker pull ghcr.io/open-webui/open-webui:main
If you’re using Apple Silicon (M1/M2/M3), this command automatically pulls the right architecture. No extra steps required.
Step 3: Run Open WebUI Container
Use the following command to run Open WebUI and connect it with Ollama:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Explanation:
-p 3000:8080: Maps container port to local port 3000.--add-host: Ensures connectivity withlocalhost(Ollama).-v: Mounts persistent storage for chat history and settings.--restart always: Restarts container automatically on reboot.
Wait a few seconds after the command runs.
Step 4: Open the Web Interface
Go to your browser and open:
http://localhost:3000
You’ll see the Open WebUI interface. It connects automatically to your local Ollama instance. No manual API key or token setup is required.
Step 5: Confirm Model Connectivity
Click on Settings > Models. You should see all models available in your Ollama setup (e.g., llama2, llama3, mistral).
If no models appear:
- Open your terminal.
- Run:
ollama list - Make sure at least one model is downloaded.
- Restart the Open WebUI container:
docker restart open-webui
You now have a functioning local AI interface powered by Ollama and Open WebUI.
The next step is to download and manage models for your use case.
Configuring Network Access
After setting up Ollama and Open WebUI, your environment runs locally and is accessible only from your machine. If you want remote access—for collaboration, testing on other devices, or exposing it to secure internal tools—you need to configure network access.
Here’s how to do it safely and correctly on macOS.
1. Accessing Open WebUI Locally
By default, Open WebUI runs on:
http://localhost:3000
This address is only accessible from the same machine. To access it from other devices on your local network (e.g., mobile, tablet, or another computer), follow these steps.
Step 1: Get Your Local IP Address
Run the following:
ipconfig getifaddr en0
This returns your Mac’s local IP (e.g.,
192.168.0.101).
Step 2: Re-run the Open WebUI Container to Bind to All Interfaces
Stop the current container:
docker stop open-webui
docker rm open-webui
Start it again, but this time bind it to all network interfaces:
docker run -d \
-p 3000:8080 \
-v open-webui:/app/backend/data \
--add-host=host.docker.internal:host-gateway \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Now Open WebUI will be available at:
http://<your-local-ip>:3000
from any device on the same Wi-Fi or LAN.
2. Optional: Remote Access from the Internet
Directly exposing a local service to the internet isn’t secure unless you’re careful. If you do want internet access, use a tunneling solution.
Method A: Use Ngrok
Install Ngrok:
brew install --cask ngrok
Authenticate Ngrok:
ngrok config add-authtoken <your-auth-token>
Expose your Open WebUI port:
ngrok http 3000
Ngrok will generate a public HTTPS URL like:
https://ab12-34-56-78-90.ngrok.io
You can now access Open WebUI securely from anywhere.
Method B: Use Cloudflare Tunnel (Optional Alternative)
If you’re already using Cloudflare for your domain, Cloudflare Tunnel is a more stable solution. It requires additional setup via Cloudflare dashboard and cloudflared.
3. Optional: Use a Reverse Proxy with Nginx (Advanced)
To integrate SSL and custom domains, configure a reverse proxy.
- Install Nginx:
brew install nginx
- Edit the Nginx config:
nano /opt/homebrew/etc/nginx/nginx.conf
- Add a server block for Open WebUI:
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
- Restart Nginx:
brew services restart nginx
Use Let’s Encrypt or Cloudflare to manage SSL if needed.
At this point, your local AI environment is accessible based on your configuration—whether strictly on-device, across a local network, or securely over the internet.
Next, let’s move on to downloading and managing models efficiently.
Downloading and Managing Models
Once Ollama and Open WebUI are running, you need to download models to begin generating responses. You can manage models in two ways: using the terminal (Ollama CLI) or through the browser interface (Open WebUI). Both options connect to the same local backend.
Option 1: Using the Ollama CLI
This method is direct and script-friendly. Use it when you want to install or switch models without relying on a browser.
Step 1: List Available Models
ollama run
Running this without a model name will prompt you with supported options like:
llama2llama3mistralgemmacodellama
You can also explore all community-supported models here:
https://ollama.com/library
Step 2: Download and Launch a Model
To download and run a specific model (e.g., llama3):
ollama run llama3
This will:
- Automatically download the model if not already present
- Start the model and wait for input
You can exit the session with Ctrl + C.
Step 3: Check Installed Models
To view which models are installed locally:
ollama list
It shows the name, size, and download status of each model.
Step 4: Remove a Model
If you no longer need a model:
ollama rm mistral
This frees up disk space.
Option 2: Using Open WebUI
This method offers a visual way to manage models—especially useful for those who prefer not to use the terminal.
Step 1: Access the Model Settings
- Visit
http://localhost:3000 - Click on the gear icon (⚙) in the bottom-left corner.
- Navigate to the Models tab.
Step 2: Download New Models
- In the Models tab, enter the model name (e.g.,
llama3) in the download field. - Click Download.
You’ll see real-time progress and confirmation once the model is ready.
Step 3: Switch Models During Chat
- Open any chat session.
- Click the model selector dropdown at the top.
- Choose your preferred model (e.g.,
mistral,codellama).
The selection is instant—no need to restart the app or reload the page.
Where Are These Models Stored?
By default, Ollama stores models in:
~/Library/Application Support/Ollama
Each model can consume several GBs. Monitor your disk space and clean up unused models regularly.
With the right models downloaded, you’re now ready to create your own. In the next section, we’ll walk through how to build and run custom models with modified parameters.
Creating Custom Models
Ollama allows you to go beyond default models by building custom ones. You can extend an existing base model like llama3 or mistral and modify its parameters, prompt behavior, or system message defaults—all locally, without relying on any external service.
This section explains how to create and register a custom model on macOS using Ollama’s CLI.
Step 1: Create a New Model Definition File
Ollama uses a file format similar to Dockerfiles. It starts from a base model and then overrides parameters.
Example: Create a file named CustomLlama3.ollama
touch CustomLlama3.ollama
Open the file in your editor (e.g., VS Code):
code CustomLlama3.ollama
Paste the following content:
FROM llama3
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM "You are an AI assistant built to help developers with concise, technical answers. Do not provide unnecessary explanations."
Step 2: Build the Custom Model
Use the following command to build and register the custom model:
ollama create dev-helper --file CustomLlama3.ollama
This creates a new model named dev-helper. You’ll see confirmation logs as Ollama validates and stores the custom configuration.
You can use any valid base model name in the
FROMline—mistral,gemma,codellama, and so on.
Step 3: Verify Model Creation
List all available models:
ollama list
You should now see dev-helper in the list.
Step 4: Run the Custom Model
You can now run the custom model in the terminal:
ollama run dev-helper
Or access it via Open WebUI:
- Open
http://localhost:3000 - Go to any chat session
- Use the model selector dropdown
- Choose
dev-helperfrom the list
Step 5: Optional – Fine-Tune with Embedded System Prompts
To give your model a specialized purpose (e.g., data analysis, security assistant, or coding bot), modify the SYSTEM message in the .ollama file accordingly. You can also chain parameters:
FROM codellama
PARAMETER temperature 0.5
PARAMETER repeat_penalty 1.2
PARAMETER num_predict 256
SYSTEM "You are a coding assistant that prioritizes best practices and writes production-grade code."
Then re-create the model:
ollama create prod-coder --file CustomLlama3.ollama
Custom models help you control how responses are generated. They’re especially useful in multi-user setups or when building task-specific assistants.
In the next section, we’ll explore how to extend these models further using RAG pipelines, custom workflows, and user management.
Advanced Features and Customizations
Once your local AI setup is stable, you can start layering in advanced capabilities. Ollama and Open WebUI aren’t just limited to running static models—they support multiple configurations that make your workflows faster, more efficient, and task-specific.
This section walks through features that elevate your local AI from a basic chat interface to a fully capable decision-support and automation tool.
1. Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation enhances your AI’s accuracy by injecting real-time context into the prompt. Instead of relying solely on the model’s training data, RAG workflows use external files or documents to influence the model’s response.
How to Use RAG in Open WebUI
- Open the UI at
http://localhost:3000 - Go to “Documents” from the left navigation panel
- Upload PDF, TXT, CSV, or Markdown files
- Open a chat and toggle the Document Assist mode
Now the model will reference the uploaded content when generating responses.
Use Case Examples:
- Query product documentation to generate user support replies
- Search internal policy documents during compliance checks
- Summarize multi-page reports with direct citations
2. User Management
If you’re using your local AI platform across a team or shared device, you can enable user accounts inside Open WebUI.
Enabling User Authentication
By default, Open WebUI disables login. To enable multi-user mode:
- Stop the container:
docker stop open-webui docker rm open-webui - Run the container with auth enabled:
docker run -d \ -p 3000:8080 \ -v open-webui:/app/backend/data \ -e OLLAMA_WEBUI_AUTH=true \ --name open-webui \ ghcr.io/open-webui/open-webui:main
Each user gets isolated sessions and model preferences.
3. Voice and Audio Integration (Optional)
Open WebUI supports Whisper-based transcription and voice chat using your microphone.
Enable Voice Input
- Connect a microphone to your Mac
- Click on the mic icon in the chat input box
- Speak, and it will transcribe and send the text to your model
You must allow microphone access in your browser.
Use Case Examples:
- Developers giving verbal instructions to code assistants
- Transcribing meetings and instantly summarizing content
4. Custom Prompt Templates
You can create reusable system prompts to speed up specialized queries.
- Navigate to Settings > Prompts
- Create a new template
- Save it with a recognizable title (e.g., “Summarize code logic” or “Translate tech doc to Spanish”)
- Apply the template from any chat
Prompt templates can be used to:
- Enforce tone and formatting
- Act as role-based assistants (e.g., reviewer, translator, analyst)
5. Model-Specific Parameters (On the Fly)
Even without building a new custom model, you can modify parameters at runtime via the WebUI interface.
Navigate to Settings > Models, select a model, and adjust:
- Temperature (controls randomness)
- Context length
- Number of predicted tokens
Changes are applied instantly and affect the next interaction.
These customizations enable your local AI to function more like a toolset than just a chatbot.
In the next section, we’ll go over maintenance and how to keep everything up-to-date with minimal effort.
Maintenance and Updates
Keeping your setup up-to-date ensures stability, compatibility, and access to new features. Ollama and Open WebUI both offer straightforward update workflows on macOS.
1. Update Ollama
Use Homebrew to update Ollama:
brew upgrade ollama
After the update, restart the background service:
brew services restart ollama
Verify:
ollama --version
2. Update Open WebUI
Since it runs in Docker, pull the latest image:
docker pull ghcr.io/open-webui/open-webui:main
Then restart the container:
docker stop open-webui
docker rm open-webui
docker run -d \
-p 3000:8080 \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
This retains your settings and chat history.
3. Monitor Disk Usage
Run this to check how much space models are using:
du -sh ~/Library/Application\ Support/Ollama
Remove unused models if needed:
ollama rm model-name
4. Backup Configuration (Optional)
You can back up your Open WebUI data by copying the Docker volume:
docker cp open-webui:/app/backend/data ./webui-backup
With these steps, your local AI platform will stay performant and reliable with minimal effort.
Conclusion
Running AI models locally gives developers full control over performance, privacy, and customization—without relying on external APIs or cloud billing. With Ollama handling model execution and Open WebUI providing a streamlined interface, you can build a complete local AI platform within minutes.
We walked through everything you need—from installing Ollama and Open WebUI on macOS, configuring local or remote access, downloading and managing models, to creating your own custom models with precise parameters. We also explored how to extend your setup using features like RAG, user isolation, voice input, and prompt templates.
This setup isn’t just experimental—it’s production-capable. Whether you’re building internal tools, prototyping AI assistants, or simply working offline, this local-first architecture gives you the freedom to iterate faster and more securely.
What kind of use case are you planning to build with your self-hosted AI? Let us know in the comments—or try pushing the limits of Ollama with your own prompt-engineering experiments.
Ready to turn your local AI instance into something more? Start with custom models. That’s where the real flexibility begins.
People Also Ask (FAQ)
1. Can I run Ollama on an M1/M2/M3 Mac without a dedicated GPU?
Yes. Ollama is optimized for Apple Silicon and uses the built-in neural engine and CPU efficiently. You don’t need an external GPU to run most models. However, performance varies based on model size. Lightweight models like mistral or llama2 run smoothly on M1/M2 devices. Larger models (e.g., llama3:70b) may require more memory and patience during initialization.
2. Does Ollama run models completely offline?
Yes. After a model is downloaded the first time, all inference runs locally—without any internet connection. No data leaves your machine during interactions. Only the initial model pull uses internet access (unless you’ve manually cached it or transferred it from another machine).
3. How much disk space do the models take up?
Model sizes vary:
mistral: ~4.1 GBllama2: ~7.3 GBllama3: ~8.0 GBcodellama:13b: ~13.5 GB
You can check local usage using:
ollama list
To save space, remove unused models with:
ollama rm model-name
4. Can I host multiple models simultaneously?
Ollama loads one model into memory per session. However, you can switch models without restarting the service. In Open WebUI, you can dynamically switch between available models. For concurrent sessions, you’ll need to configure multiple containers or abstract the orchestration with a custom interface or router logic.
5. Can I use Ollama models in my own application?
Yes. Ollama runs a local REST API on localhost:11434. You can integrate it into your applications using HTTP requests.
Example request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Explain how HTTP/2 works"
}'
You can also build custom wrappers around this API in Python, Node.js, or any backend language.
6. Is there a way to fine-tune models in Ollama?
No. As of now, Ollama does not support full fine-tuning (i.e., retraining on new datasets). However, you can:
- Modify system prompts
- Set temperature, context window, and token limits
- Use RAG for dynamic context injection
These options cover most personalization needs without retraining.
7. Does Open WebUI support multiple users?
Yes. Multi-user mode can be enabled using an environment variable (OLLAMA_WEBUI_AUTH=true). This adds authentication and isolates user chat sessions. It’s suitable for teams or shared workstations.
8. Can I expose Open WebUI over the internet safely?
Yes, but only with caution. Use secure tunneling services like:
- Ngrok (with HTTPS and auth token)
- Cloudflare Tunnel (zero-trust access model)
Never expose port 3000 directly over public IP without a reverse proxy, SSL, and access control in place.
9. What’s the difference between Ollama and LM Studio or LMDeploy?
| Feature | Ollama | LM Studio | LMDeploy (by Hugging Face) |
|---|---|---|---|
| OS Support | macOS, Linux, Windows | macOS, Windows | Linux (primarily) |
| UI | CLI + WebUI Integration | Electron GUI | CLI + SDK |
| Custom Models | Yes (via ollama create) | Limited | Yes (manual configuration) |
| API Support | Yes | Limited | Yes |
| Model Format | GGUF | GGUF | Transformers / ONNX |
Ollama is more developer-friendly with its API-first design and local-first architecture, especially when paired with Open WebUI.
10. How often are models and features updated?
Ollama maintains an active development cycle. New model versions, performance optimizations, and compatibility improvements are released regularly. To stay updated:
- Run:
brew upgrade ollama docker pull ghcr.io/open-webui/open-webui:main - Watch the GitHub repositories for:
If you’re stuck at any step, try isolating the issue:
- Run
docker logs open-webuito debug UI startup - Use
ollama logsfor backend issues - Check CPU/memory usage with
Activity Monitorduring inference
Need more help? Drop your use case or error on GitHub Discussions or relevant dev forums. This ecosystem is growing fast—your feedback might shape what’s next.