Scientyfic World

Deploy Local LLMs with Ollama and n8n for Private Workflows

As privacy concerns rise, API-based large language models (LLMs) may not suit sensitive tasks. This guide shows you how to set up local LLMs with Ollama—a fully local, containerized runtime—integrated...
Share:

Get an AI summary of this article

Deploy local LLMs with Ollama and N8n blog banner image

In an era of widespread data harvesting, regulatory overhead, and privacy concerns, relying solely on API-based large language model (LLM) solutions such as OpenAI’s GPT-4 or Google’s Gemini isn’t always viable, especially for sensitive workflows. Whether you’re building custom automations, integrating internal tools, or orchestrating data flow across critical systems, data residency and control are non-negotiables for many enterprises.

This guide demonstrates exactly how to run LLMs on-prem using Ollama — a fully local, containerized LLM runtime — and integrate them seamlessly into your n8n workflow automations. n8n is a powerful, source-available automation framework that allows users to wire together logic without giving up control of their infrastructure. Paired with Ollama, it becomes a privacy-first AI automation engine, with no third-party API calls, no usage limits, and full control of your language models.

This isn’t just a quickstart. We’ll explore:

  • How to run local LLMs with Ollama on your server or desktop
  • How to expose local models over an HTTP interface
  • How to build dynamic, human-in-the-loop AI workflows in n8n
  • How to validate, test, and secure your configuration
  • Advanced customization and performance tuning

When implemented, this architecture allows you to:

  • Keep all data processing on-premises for compliance and security
  • Ensure deterministic infrastructure behavior — no rate limits or model changes
  • Use lightweight workflows to automate repetitive knowledge tasks
  • Support air-gapped environments, sensitive intellectual property use cases, and even IoT or offline deployments

Let’s dive in.

Prerequisites

Before we begin implementing, you’ll need to ensure the following tools and infrastructure components are installed and functioning as expected. Each prerequisite serves a specific purpose, and it’s important not to skip versions or configurations that affect compatibility down the road.

1. System Requirements

  • OS: Linux/macOS/Windows (x86_64 or ARM64)
  • Memory: Minimum 16GB RAM (32GB recommended for larger models)
  • Storage: 10GB+ free space (models like LLaMA2-13B take 8–12GB)

2. Ollama Installation

Ollama allows you to run open-source LLMs like LLaMA, Mistral, Phi, or CodeLLaMA with minimal setup. You don’t need Docker or external dependencies for most setups.

# macOS (Homebrew)
brew install ollama

# Ubuntu
curl -fsSL https://ollama.com/install.sh | sh

# Windows (WSL or native)
# Follow instructions at: https://ollama.com/download
Bash

Verify installation:

ollama --version

Expected output: ollama version 0.X.X

3. n8n Installation

n8n can run locally using Docker Compose, or installed via npm. For the most reliable and scalable setup, we’ll use Docker.

# Create a Docker Compose file (docker-compose.yml)
version: "3"
services:
  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    volumes:
      - ~/.n8n:/home/node/.n8n
    environment:
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=secure-password
    restart: always
YAML

Then run:

docker compose up -d

Verify that n8n is accessible at http://localhost:5678. Log in using the credentials you set above.

4. Node.js + curl (Optional)

Some examples will involve calling ollama endpoints directly from CLI or scripting interfaces.

5. Model Selection

Ollama supports various open-source models. To keep memory usage low, we’ll start with llama2:7b — a solid general-purpose LLM with manageable resource requirements.

ollama pull llama2

Step-By-Step Implementation

This section will walk through every implementation detail — from loading the model and exposing an HTTP endpoint to wiring it into n8n using the native HTTP Request node. We’ll enforce robust error handling and consider expansion points.

Step 1: Run a Model with Ollama

First, confirm that the model runs locally and is ready to respond to prompts.

ollama run llama2

Once running, try pasting a prompt like:

What are some ways to improve API performance?

You should see a textual response from the model. Press Ctrl+C to stop.

Step 2: Start Ollama’s REST API Server

Ollama includes a built-in REST API when you run it in service mode:

ollama serve

This exposes an HTTP interface on http://localhost:11434.

Now test it with:

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "List some n8n use cases involving AI",
  "stream": false
}' -H "Content-Type: application/json"
Plaintext

You should receive a JSON response like:

{
  "response": "Some n8n AI use cases include summarizing documents...",
  "done": true
}

Step 3: Build the n8n Workflow

Now we’ll wire this model into n8n so it can dynamically invoke the local model during any trigger or logic branch. Here’s the basic structure:

  1. Trigger Node (Manual/HTTP/Webhook)
  2. Set Prompt (from user input, file, or database)
  3. HTTP Request to Ollama
  4. Store or Send Result

3.1: Add Trigger

Drag in a manual trigger, webhook, or other node to initiate the flow.

3.2: Add a “Set” Node to define a prompt

{
  "prompt": "Explain OAuth2 vs JWT for API authorization"
}

3.3: Add HTTP Request Node for Ollama

Configure as follows:

  • HTTP Method: POST
  • URL: http://host.docker.internal:11434/api/generate
  • Body Content Type: JSON
  • Request Body:
{
  "model": "llama2",
  "prompt": "={{ $json[\"prompt\"] }}",
  "stream": false
}

Important Note: If running n8n in Docker, the Ollama service must run either on the host or be reachable via Docker-internal addresses like host.docker.internal.

3.4: Extract Result

Add a subsequent node (e.g., Set or Function) to parse response.response from the Ollama output.

Step 4: Expand the Workflow

You can now insert this functionality into broader workflows:

  • Trigger via Outlook/Gmail → summarize email content
  • Trigger via HTTP form → give advice/reply generation
  • Scheduled cron job → read RSS → summarize → send Slack

Testing & Output

Case 1: Happy Path

Test a known input:

Prompt: "Give 3 ways Kafka improves system resiliency"

Expected Output:

Kafka improves resiliency by enabling high-throughput asynchronous messaging...

Case 2: Malformed Prompt

Prompt: "" (empty)

n8n HTTP Request node should return a 400 or timeout. Add pre-validation in the Set node:

if (!prompt || prompt.length < 5) {
  throw new Error("Prompt is too short");
}

Case 3: Ollama Not Reachable

n8n will return:

ECONNREFUSED, ENOTFOUND

Ensure Ollama is running and Docker can connect to localhost or mapped IP. Use docker network inspect and fix host mappings as needed.

Validation Tips

  • Add logs using n8n’s “Function” or “Console” nodes
  • Test concurrency: fire multiple requests and observe response time

Advanced Configuration

Model Switching at Runtime

Set model dynamically in the n8n request body:

"model": "={{ $json[\"selectedModel\"] || 'llama2' }}"

This allows you to switch between codegemma:7b, mistral:7b, etc., based on input or workflow context.

Streaming Responses

To support stream mode (for large inputs), set:

"stream": true

n8n does not support streaming natively in Response nodes. Instead, stream to file or callback endpoint.

Authentication for Ollama

By default, the /api/generate endpoint is open. To secure:

  • Use a reverse proxy (e.g., Traefik, NGINX) with basic-auth
  • Firewall using UFW/stateful firewall rule
  • Tunnel via Tailscale/VPN

Scalability and Load

  • Run Ollama on a dedicated GPU machine or ARM board with high RAM
  • Use queueing within n8n or RabbitMQ to buffer floods
  • Don’t invoke >1 LLaMA2 stream in parallel unless it fits RAM budget

Monitoring & Debugging

  • Run Ollama with verbose logs: OLLAMA_DEBUG=1 ollama serve
  • Check n8n logs via container logs: docker logs n8n_container

Conclusion

We’ve architected a powerful hybrid: Ollama’s lightweight LLM engine + n8n’s extensible automation studio. By combining these self-hosted platforms, developers and operations teams can build data-responsible, privacy-first AI tools with full infrastructure ownership.

You now have a way to run arbitrary AI prompts on sensitive data — without exposing anything to third-party clouds. Whether you’re summarizing EHR records, generating emails, enriching tickets, or writing batch jobs, this local LLM+n8n stack adds private intelligence to any business logic.

Next steps:

  • Try other models (e.g., mistral:instruct, codellama)
  • Connect to vector store (e.g., Qdrant or Weaviate) via n8n + Ollama
  • Deploy in a Kubernetes cluster with multiple n8n/Ollama nodes

FAQs

Can I use Ollama over HTTPS?

No native TLS in built-in server. You must proxy behind NGINX or Caddy for HTTPS encryption. We recommend terminating SSL externally and restricting the Ollama port using firewall rules.

How do I run multiple concurrent requests with limited RAM?

Use a queueing system (e.g., RabbitMQ or Redis queues) and limit to one active LLM call at a time. You can fork across multiple machines or containerize per model instance if horizontal scaling is needed.

Which models are best for summarization or classification?

Mistral 7b Instruct: Strong few-shot summarizer
LLaMa2: Versatile general model
Phi-2: Lightweight option for shortform answers

Use ollama pull mistral or ollama pull phi to try alternatives.

How can I integrate PDFs or files into the prompt?

Use n8n’s file ingest + text parsing nodes (e.g., Read Binary File → Convert to Text) and pass that into the prompt. If the input exceeds token limit, chunk via Fixed Window or Sliding Window strategy.

I get garbled output or poor grammar — how can I fix it?

Check prompt format. Many models expect instruction-completion framing like:
You are a helpful assistant.
Q: What is Kubernetes? A: Kubernetes is a container orchestration platform...
Tune few-shot examples or test structured prompts that trigger better behavior.

 

Snehasish Konger
Developed @scientyficworld.org | Technical writer @Nected | Content Developer
Connect with Snehasish Konger

On This page

Take a Pause with Intervals

A Sunday letter on building, writing, and thinking deeper as a developer — short, honest, and worth your time.

Snehasish Konger profile photo

"Hey there — I'm Snehasish. Hope this post saved you some head-scratching time! I've spent years turning technical chaos into clarity, and I'm here to be your guide through the maze of modern tech. Stick around for more lightbulb moments — we're just getting started."

Related Posts