With growing privacy concerns, developers increasingly prefer local, self-hosted AI environments. Cloud-based AI platforms frequently involve third-party data handling and recurring costs. Hosting your own AI models locally solves these issues. You retain complete control over data security, customisation, and performance optimisation.

Ollama simplifies local hosting of large language models (LLMs) without extensive configurations. With Open WebUI, a straightforward user interface, it creates a seamless, private AI experience. Open WebUI directly integrates with Ollama, allowing developers to interact easily with AI models through a clean browser-based interface.

This how-to guide walks you step-by-step through hosting your local AI platform with Ollama and Open WebUI. By the end, you’ll have a fully functional, private, and efficient AI environment operating entirely within your infrastructure.

Are you ready to build your own local AI platform? Let’s begin.

NOTE: I've done this whole process in MacOS. So, few code snippets can be different for you if you're using Windows or Linux

Prerequisites

Before setting up your local AI platform, confirm your Mac meets the necessary requirements.

Hardware Requirements

Recommended RAM: At least 16 GB (32 GB provides smoother performance with larger models).
Storage Space: Minimum 50 GB free disk space for models and dependencies.
Processor: Intel or Apple Silicon (M1 or later) Macs are suitable.
GPU: Integrated Apple GPUs work effectively; external GPUs are optional but improve performance.

Software Requirements

macOS Version

Check your macOS version to confirm compatibility:

sw_vers

Ensure macOS version is Ventura (13.x) or Sonoma (14.x).

Homebrew

Homebrew simplifies installing software on macOS. Verify installation with:

brew --version

If Homebrew isn’t installed, install it using:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Docker Desktop

Docker is essential for running Open WebUI.

Install Docker Desktop via Homebrew:

brew install --cask docker

After installation, launch Docker Desktop and confirm the installation by running:

docker --version

Ollama

Install Ollama on macOS using Homebrew:

brew install ollama

Check the Ollama installation by running:

ollama --version

Optional:

Visual Studio Code (or your preferred IDE): brew install --cask visual-studio-code
iTerm2 (alternative terminal emulator): brew install --cask iterm2

Final Checks

Ensure Docker Desktop is running and that your terminal correctly recognizes Docker:

docker ps

This command should run without errors.

You’re now set up with the required prerequisites to host your local AI platform using Ollama and Open WebUI on your Mac.

Next, let’s install and configure Ollama.

Installing Ollama

Ollama runs large language models locally with minimal setup. It abstracts away GPU configurations, handles model downloading, and exposes a local API you can interact with. Installing Ollama on macOS is straightforward using Homebrew.

Step 1: Install Ollama via Homebrew

Run the following command:

brew install ollama

This will install the latest version of Ollama on your system. Once installed, verify it:

ollama --version

You should see the installed version printed in the terminal. If you get a “command not found” error, restart your terminal and try again.

Step 2: Start the Ollama Background Service

After installation, Ollama runs as a background service. To confirm it’s running:

ollama serve

This command manually starts the Ollama server. You’ll see logs indicating it’s listening on localhost:11434. You can stop this process with Ctrl + C.

If you prefer to run Ollama as a persistent background service, use:

brew services start ollama

This ensures that Ollama starts automatically with your system and runs in the background continuously.

To stop it at any time:

brew services stop ollama

Step 3: Confirm Server Availability

Run this to check whether the Ollama server is running:

curl http://localhost:11434

You should see:

{"status":"ok"}

This confirms that Ollama is up and ready to handle requests.

Step 4: Run Your First Model

To confirm everything is working, run a sample model. Start with a lightweight model like llama2:

ollama run llama2

Ollama will download the model (first-time only) and launch it. After initialization, you’ll see a terminal prompt where you can type queries.

Want a more powerful model like llama3 or mistral? You can replace the model name accordingly:

ollama run llama3

ollama run mistral

You now have Ollama running locally on macOS. In the next section, we’ll install Open WebUI to add a clean interface on top of this setup.

Installing Open WebUI

With Ollama set up, the next step is to add a user interface. Open WebUI is a lightweight frontend that connects directly with Ollama running locally. It provides a clean, browser-based interface to interact with LLMs, manage models, and run queries more efficiently.

Let’s install and configure it using Docker.

Step 1: Verify Docker Installation

Before proceeding, confirm Docker is running:

docker info

If this throws an error, launch Docker Desktop from Applications and wait until it starts.

Step 2: Pull the Open WebUI Docker Image

Download the latest Open WebUI container:

docker pull ghcr.io/open-webui/open-webui:main

If you’re using Apple Silicon (M1/M2/M3), this command automatically pulls the right architecture. No extra steps required.

Step 3: Run Open WebUI Container

Use the following command to run Open WebUI and connect it with Ollama:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Explanation:

-p 3000:8080: Maps container port to local port 3000.
--add-host: Ensures connectivity with localhost (Ollama).
-v: Mounts persistent storage for chat history and settings.
--restart always: Restarts container automatically on reboot.

Wait a few seconds after the command runs.

Step 4: Open the Web Interface

Go to your browser and open:

http://localhost:3000

You’ll see the Open WebUI interface. It connects automatically to your local Ollama instance. No manual API key or token setup is required.

Step 5: Confirm Model Connectivity

Click on Settings > Models. You should see all models available in your Ollama setup (e.g., llama2, llama3, mistral).

If no models appear:

Open your terminal.
Run: ollama list
Make sure at least one model is downloaded.
Restart the Open WebUI container: docker restart open-webui

You now have a functioning local AI interface powered by Ollama and Open WebUI.
The next step is to download and manage models for your use case.

Configuring Network Access

After setting up Ollama and Open WebUI, your environment runs locally and is accessible only from your machine. If you want remote access—for collaboration, testing on other devices, or exposing it to secure internal tools—you need to configure network access.

Here’s how to do it safely and correctly on macOS.

1. Accessing Open WebUI Locally

By default, Open WebUI runs on:

http://localhost:3000

This address is only accessible from the same machine. To access it from other devices on your local network (e.g., mobile, tablet, or another computer), follow these steps.

Step 1: Get Your Local IP Address

Run the following:

ipconfig getifaddr en0

This returns your Mac’s local IP (e.g., 192.168.0.101).

Step 2: Re-run the Open WebUI Container to Bind to All Interfaces

Stop the current container:

docker stop open-webui
docker rm open-webui

Start it again, but this time bind it to all network interfaces:

docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --add-host=host.docker.internal:host-gateway \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Now Open WebUI will be available at:

http://<your-local-ip>:3000

from any device on the same Wi-Fi or LAN.

2. Optional: Remote Access from the Internet

Directly exposing a local service to the internet isn’t secure unless you’re careful. If you do want internet access, use a tunneling solution.

Method A: Use Ngrok

Install Ngrok:

brew install --cask ngrok

Authenticate Ngrok:

ngrok config add-authtoken <your-auth-token>

Expose your Open WebUI port:

ngrok http 3000

Ngrok will generate a public HTTPS URL like:

https://ab12-34-56-78-90.ngrok.io

You can now access Open WebUI securely from anywhere.

Method B: Use Cloudflare Tunnel (Optional Alternative)

If you’re already using Cloudflare for your domain, Cloudflare Tunnel is a more stable solution. It requires additional setup via Cloudflare dashboard and cloudflared.

3. Optional: Use a Reverse Proxy with Nginx (Advanced)

To integrate SSL and custom domains, configure a reverse proxy.

Install Nginx:

brew install nginx

Edit the Nginx config:

nano /opt/homebrew/etc/nginx/nginx.conf

Add a server block for Open WebUI:

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://localhost:3000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Restart Nginx:

brew services restart nginx

Use Let’s Encrypt or Cloudflare to manage SSL if needed.

At this point, your local AI environment is accessible based on your configuration—whether strictly on-device, across a local network, or securely over the internet.

Next, let’s move on to downloading and managing models efficiently.

Downloading and Managing Models

Once Ollama and Open WebUI are running, you need to download models to begin generating responses. You can manage models in two ways: using the terminal (Ollama CLI) or through the browser interface (Open WebUI). Both options connect to the same local backend.

Option 1: Using the Ollama CLI

This method is direct and script-friendly. Use it when you want to install or switch models without relying on a browser.

Step 1: List Available Models

ollama run

Running this without a model name will prompt you with supported options like:

llama2
llama3
mistral
gemma
codellama

You can also explore all community-supported models here:
https://ollama.com/library

Step 2: Download and Launch a Model

To download and run a specific model (e.g., llama3):

ollama run llama3

This will:

Automatically download the model if not already present
Start the model and wait for input

You can exit the session with Ctrl + C.

Step 3: Check Installed Models

To view which models are installed locally:

ollama list

It shows the name, size, and download status of each model.

Step 4: Remove a Model

If you no longer need a model:

ollama rm mistral

This frees up disk space.

Option 2: Using Open WebUI

This method offers a visual way to manage models—especially useful for those who prefer not to use the terminal.

Step 1: Access the Model Settings

Visit http://localhost:3000
Click on the gear icon (⚙) in the bottom-left corner.
Navigate to the Models tab.

Step 2: Download New Models

In the Models tab, enter the model name (e.g., llama3) in the download field.
Click Download.

You’ll see real-time progress and confirmation once the model is ready.

Step 3: Switch Models During Chat

Open any chat session.
Click the model selector dropdown at the top.
Choose your preferred model (e.g., mistral, codellama).

The selection is instant—no need to restart the app or reload the page.

Where Are These Models Stored?

By default, Ollama stores models in:

~/Library/Application Support/Ollama

Each model can consume several GBs. Monitor your disk space and clean up unused models regularly.

With the right models downloaded, you’re now ready to create your own. In the next section, we’ll walk through how to build and run custom models with modified parameters.

Creating Custom Models

Ollama allows you to go beyond default models by building custom ones. You can extend an existing base model like llama3 or mistral and modify its parameters, prompt behavior, or system message defaults—all locally, without relying on any external service.

This section explains how to create and register a custom model on macOS using Ollama’s CLI.

Step 1: Create a New Model Definition File

Ollama uses a file format similar to Dockerfiles. It starts from a base model and then overrides parameters.

Example: Create a file named `CustomLlama3.ollama`

touch CustomLlama3.ollama

Open the file in your editor (e.g., VS Code):

code CustomLlama3.ollama

Paste the following content:

FROM llama3

PARAMETER temperature 0.7
PARAMETER num_ctx 4096
SYSTEM "You are an AI assistant built to help developers with concise, technical answers. Do not provide unnecessary explanations."

Step 2: Build the Custom Model

Use the following command to build and register the custom model:

ollama create dev-helper --file CustomLlama3.ollama

This creates a new model named dev-helper. You’ll see confirmation logs as Ollama validates and stores the custom configuration.

You can use any valid base model name in the FROM line—mistral, gemma, codellama, and so on.

Step 3: Verify Model Creation

List all available models:

ollama list

You should now see dev-helper in the list.

Step 4: Run the Custom Model

You can now run the custom model in the terminal:

ollama run dev-helper

Or access it via Open WebUI:

Open http://localhost:3000
Go to any chat session
Use the model selector dropdown
Choose dev-helper from the list

Step 5: Optional – Fine-Tune with Embedded System Prompts

To give your model a specialized purpose (e.g., data analysis, security assistant, or coding bot), modify the SYSTEM message in the .ollama file accordingly. You can also chain parameters:

FROM codellama

PARAMETER temperature 0.5
PARAMETER repeat_penalty 1.2
PARAMETER num_predict 256
SYSTEM "You are a coding assistant that prioritizes best practices and writes production-grade code."

Then re-create the model:

ollama create prod-coder --file CustomLlama3.ollama

Custom models help you control how responses are generated. They’re especially useful in multi-user setups or when building task-specific assistants.

In the next section, we’ll explore how to extend these models further using RAG pipelines, custom workflows, and user management.

Advanced Features and Customizations

Once your local AI setup is stable, you can start layering in advanced capabilities. Ollama and Open WebUI aren’t just limited to running static models—they support multiple configurations that make your workflows faster, more efficient, and task-specific.

This section walks through features that elevate your local AI from a basic chat interface to a fully capable decision-support and automation tool.

1. Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation enhances your AI’s accuracy by injecting real-time context into the prompt. Instead of relying solely on the model’s training data, RAG workflows use external files or documents to influence the model’s response.

How to Use RAG in Open WebUI

Open the UI at http://localhost:3000
Go to “Documents” from the left navigation panel
Upload PDF, TXT, CSV, or Markdown files
Open a chat and toggle the Document Assist mode

Now the model will reference the uploaded content when generating responses.

Use Case Examples:

Query product documentation to generate user support replies
Search internal policy documents during compliance checks
Summarize multi-page reports with direct citations

2. User Management

If you’re using your local AI platform across a team or shared device, you can enable user accounts inside Open WebUI.

Enabling User Authentication

By default, Open WebUI disables login. To enable multi-user mode:

Stop the container: docker stop open-webui docker rm open-webui
Run the container with auth enabled: docker run -d \ -p 3000:8080 \ -v open-webui:/app/backend/data \ -e OLLAMA_WEBUI_AUTH=true \ --name open-webui \ ghcr.io/open-webui/open-webui:main

Each user gets isolated sessions and model preferences.

3. Voice and Audio Integration (Optional)

Open WebUI supports Whisper-based transcription and voice chat using your microphone.

Enable Voice Input

Connect a microphone to your Mac
Click on the mic icon in the chat input box
Speak, and it will transcribe and send the text to your model

You must allow microphone access in your browser.

Use Case Examples:

Developers giving verbal instructions to code assistants
Transcribing meetings and instantly summarizing content

4. Custom Prompt Templates

You can create reusable system prompts to speed up specialized queries.

Navigate to Settings > Prompts
Create a new template
Save it with a recognizable title (e.g., “Summarize code logic” or “Translate tech doc to Spanish”)
Apply the template from any chat

Prompt templates can be used to:

Enforce tone and formatting
Act as role-based assistants (e.g., reviewer, translator, analyst)

5. Model-Specific Parameters (On the Fly)

Even without building a new custom model, you can modify parameters at runtime via the WebUI interface.

Navigate to Settings > Models, select a model, and adjust:

Temperature (controls randomness)
Context length
Number of predicted tokens

Changes are applied instantly and affect the next interaction.

These customizations enable your local AI to function more like a toolset than just a chatbot.
In the next section, we’ll go over maintenance and how to keep everything up-to-date with minimal effort.

Maintenance and Updates

Keeping your setup up-to-date ensures stability, compatibility, and access to new features. Ollama and Open WebUI both offer straightforward update workflows on macOS.

1. Update Ollama

Use Homebrew to update Ollama:

brew upgrade ollama

After the update, restart the background service:

brew services restart ollama

Verify:

ollama --version

2. Update Open WebUI

Since it runs in Docker, pull the latest image:

docker pull ghcr.io/open-webui/open-webui:main

Then restart the container:

docker stop open-webui
docker rm open-webui
docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

This retains your settings and chat history.

3. Monitor Disk Usage

Run this to check how much space models are using:

du -sh ~/Library/Application\ Support/Ollama

Remove unused models if needed:

ollama rm model-name

4. Backup Configuration (Optional)

You can back up your Open WebUI data by copying the Docker volume:

docker cp open-webui:/app/backend/data ./webui-backup

With these steps, your local AI platform will stay performant and reliable with minimal effort.

Conclusion

Running AI models locally gives developers full control over performance, privacy, and customization—without relying on external APIs or cloud billing. With Ollama handling model execution and Open WebUI providing a streamlined interface, you can build a complete local AI platform within minutes.

We walked through everything you need—from installing Ollama and Open WebUI on macOS, configuring local or remote access, downloading and managing models, to creating your own custom models with precise parameters. We also explored how to extend your setup using features like RAG, user isolation, voice input, and prompt templates.

This setup isn’t just experimental—it’s production-capable. Whether you’re building internal tools, prototyping AI assistants, or simply working offline, this local-first architecture gives you the freedom to iterate faster and more securely.

What kind of use case are you planning to build with your self-hosted AI? Let us know in the comments—or try pushing the limits of Ollama with your own prompt-engineering experiments.

Ready to turn your local AI instance into something more? Start with custom models. That’s where the real flexibility begins.

Feature	Ollama	LM Studio	LMDeploy (by Hugging Face)
OS Support	macOS, Linux, Windows	macOS, Windows	Linux (primarily)
UI	CLI + WebUI Integration	Electron GUI	CLI + SDK
Custom Models	Yes (via `ollama create`)	Limited	Yes (manual configuration)
API Support	Yes	Limited	Yes
Model Format	GGUF	GGUF	Transformers / ONNX

Self-host Local AI platform with Ollama and Open WebUI