Deploy Local LLMs With Ollama And N8n For Private Workflows

Q: Which models are best for summarization or classification?

Mistral 7b Instruct: Strong few-shot summarizer LLaMa2: Versatile general model Phi-2: Lightweight option for shortform answers Use ollama pull mistral or ollama pull phi to try alternatives.

In an era of widespread data harvesting, regulatory overhead, and privacy concerns, relying solely on API-based large language model (LLM) solutions such as OpenAI’s GPT-4 or Google’s Gemini isn’t always viable, especially for sensitive workflows. Whether you’re building custom automations, integrating internal tools, or orchestrating data flow across critical systems, data residency and control are non-negotiables for many enterprises.

This guide demonstrates exactly how to run LLMs on-prem using Ollama — a fully local, containerized LLM runtime — and integrate them seamlessly into your n8n workflow automations. n8n is a powerful, source-available automation framework that allows users to wire together logic without giving up control of their infrastructure. Paired with Ollama, it becomes a privacy-first AI automation engine, with no third-party API calls, no usage limits, and full control of your language models.

This isn’t just a quickstart. We’ll explore:

How to run local LLMs with Ollama on your server or desktop
How to expose local models over an HTTP interface
How to build dynamic, human-in-the-loop AI workflows in n8n
How to validate, test, and secure your configuration
Advanced customization and performance tuning

When implemented, this architecture allows you to:

Keep all data processing on-premises for compliance and security
Ensure deterministic infrastructure behavior — no rate limits or model changes
Use lightweight workflows to automate repetitive knowledge tasks
Support air-gapped environments, sensitive intellectual property use cases, and even IoT or offline deployments

Let’s dive in.

Also Read: Self-host Local LLM with Ollama

Prerequisites

Before we begin implementing, you’ll need to ensure the following tools and infrastructure components are installed and functioning as expected. Each prerequisite serves a specific purpose, and it’s important not to skip versions or configurations that affect compatibility down the road.

1. System Requirements

OS: Linux/macOS/Windows (x86_64 or ARM64)
Memory: Minimum 16GB RAM (32GB recommended for larger models)
Storage: 10GB+ free space (models like LLaMA2-13B take 8–12GB)

2. Ollama Installation

Ollama allows you to run open-source LLMs like LLaMA, Mistral, Phi, or CodeLLaMA with minimal setup. You don’t need Docker or external dependencies for most setups.

# macOS (Homebrew)
brew install ollama

# Ubuntu
curl -fsSL https://ollama.com/install.sh | sh

# Windows (WSL or native)
# Follow instructions at: https://ollama.com/download

# macOS (Homebrew)
brew install ollama

# Ubuntu
curl -fsSL https://ollama.com/install.sh | sh

# Windows (WSL or native)
# Follow instructions at: https://ollama.com/download

Bash

Verify installation:

ollama --version

Expected output: ollama version 0.X.X

3. n8n Installation

n8n can run locally using Docker Compose, or installed via npm. For the most reliable and scalable setup, we’ll use Docker.

# Create a Docker Compose file (docker-compose.yml)
version: "3"
services:
  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    volumes:
      - ~/.n8n:/home/node/.n8n
    environment:
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=secure-password
    restart: always

# Create a Docker Compose file (docker-compose.yml)
version: "3"
services:
  n8n:
    image: n8nio/n8n
    ports:
      - "5678:5678"
    volumes:
      - ~/.n8n:/home/node/.n8n
    environment:
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=secure-password
    restart: always

YAML

Then run:

docker compose up -d

Verify that n8n is accessible at http://localhost:5678. Log in using the credentials you set above.

4. Node.js + curl (Optional)

Some examples will involve calling ollama endpoints directly from CLI or scripting interfaces.

5. Model Selection

Ollama supports various open-source models. To keep memory usage low, we’ll start with llama2:7b — a solid general-purpose LLM with manageable resource requirements.

ollama pull llama2

Step-By-Step Implementation

This section will walk through every implementation detail — from loading the model and exposing an HTTP endpoint to wiring it into n8n using the native HTTP Request node. We’ll enforce robust error handling and consider expansion points.

Step 1: Run a Model with Ollama

First, confirm that the model runs locally and is ready to respond to prompts.

ollama run llama2

Once running, try pasting a prompt like:

What are some ways to improve API performance?

You should see a textual response from the model. Press Ctrl+C to stop.

Step 2: Start Ollama’s REST API Server

Ollama includes a built-in REST API when you run it in service mode:

ollama serve

This exposes an HTTP interface on http://localhost:11434.

Now test it with:

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "List some n8n use cases involving AI",
  "stream": false
}' -H "Content-Type: application/json"

curl http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt": "List some n8n use cases involving AI",
  "stream": false
}' -H "Content-Type: application/json"

Plaintext

You should receive a JSON response like:

{
  "response": "Some n8n AI use cases include summarizing documents...",
  "done": true
}

Step 3: Build the n8n Workflow

Now we’ll wire this model into n8n so it can dynamically invoke the local model during any trigger or logic branch. Here’s the basic structure:

Trigger Node (Manual/HTTP/Webhook)
Set Prompt (from user input, file, or database)
HTTP Request to Ollama
Store or Send Result

3.1: Add Trigger

Drag in a manual trigger, webhook, or other node to initiate the flow.

3.2: Add a “Set” Node to define a prompt

{
  "prompt": "Explain OAuth2 vs JWT for API authorization"
}

3.3: Add HTTP Request Node for Ollama

Configure as follows:

HTTP Method: POST
URL: http://host.docker.internal:11434/api/generate
Body Content Type: JSON
Request Body:

{
  "model": "llama2",
  "prompt": "={{ $json[\"prompt\"] }}",
  "stream": false
}

Important Note: If running n8n in Docker, the Ollama service must run either on the host or be reachable via Docker-internal addresses like host.docker.internal.

3.4: Extract Result

Add a subsequent node (e.g., Set or Function) to parse response.response from the Ollama output.

Step 4: Expand the Workflow

You can now insert this functionality into broader workflows:

Trigger via Outlook/Gmail → summarize email content
Trigger via HTTP form → give advice/reply generation
Scheduled cron job → read RSS → summarize → send Slack

Testing & Output

Case 1: Happy Path

Test a known input:

Prompt: "Give 3 ways Kafka improves system resiliency"

Expected Output:

Kafka improves resiliency by enabling high-throughput asynchronous messaging...

Case 2: Malformed Prompt

Prompt: "" (empty)

n8n HTTP Request node should return a 400 or timeout. Add pre-validation in the Set node:

if (!prompt || prompt.length < 5) {
  throw new Error("Prompt is too short");
}

Case 3: Ollama Not Reachable

n8n will return:

ECONNREFUSED, ENOTFOUND

Ensure Ollama is running and Docker can connect to localhost or mapped IP. Use docker network inspect and fix host mappings as needed.

Validation Tips

Add logs using n8n’s “Function” or “Console” nodes
Test concurrency: fire multiple requests and observe response time

Advanced Configuration

Model Switching at Runtime

Set model dynamically in the n8n request body:

"model": "={{ $json[\"selectedModel\"] || 'llama2' }}"

This allows you to switch between codegemma:7b, mistral:7b, etc., based on input or workflow context.

Streaming Responses

To support stream mode (for large inputs), set:

"stream": true

n8n does not support streaming natively in Response nodes. Instead, stream to file or callback endpoint.

Authentication for Ollama

By default, the /api/generate endpoint is open. To secure:

Use a reverse proxy (e.g., Traefik, NGINX) with basic-auth
Firewall using UFW/stateful firewall rule
Tunnel via Tailscale/VPN

Scalability and Load

Run Ollama on a dedicated GPU machine or ARM board with high RAM
Use queueing within n8n or RabbitMQ to buffer floods
Don’t invoke >1 LLaMA2 stream in parallel unless it fits RAM budget

Monitoring & Debugging

Run Ollama with verbose logs: OLLAMA_DEBUG=1 ollama serve
Check n8n logs via container logs: docker logs n8n_container

Conclusion

We’ve architected a powerful hybrid: Ollama’s lightweight LLM engine + n8n’s extensible automation studio. By combining these self-hosted platforms, developers and operations teams can build data-responsible, privacy-first AI tools with full infrastructure ownership.

You now have a way to run arbitrary AI prompts on sensitive data — without exposing anything to third-party clouds. Whether you’re summarizing EHR records, generating emails, enriching tickets, or writing batch jobs, this local LLM+n8n stack adds private intelligence to any business logic.

Next steps:

Try other models (e.g., mistral:instruct, codellama)
Connect to vector store (e.g., Qdrant or Weaviate) via n8n + Ollama
Deploy in a Kubernetes cluster with multiple n8n/Ollama nodes

FAQs

Can I use Ollama over HTTPS?

No native TLS in built-in server. You must proxy behind NGINX or Caddy for HTTPS encryption. We recommend terminating SSL externally and restricting the Ollama port using firewall rules.

How do I run multiple concurrent requests with limited RAM?

Use a queueing system (e.g., RabbitMQ or Redis queues) and limit to one active LLM call at a time. You can fork across multiple machines or containerize per model instance if horizontal scaling is needed.

Which models are best for summarization or classification?

Mistral 7b Instruct: Strong few-shot summarizer
LLaMa2: Versatile general model
Phi-2: Lightweight option for shortform answers

Use ollama pull mistral or ollama pull phi to try alternatives.

How can I integrate PDFs or files into the prompt?

Use n8n’s file ingest + text parsing nodes (e.g., Read Binary File → Convert to Text) and pass that into the prompt. If the input exceeds token limit, chunk via Fixed Window or Sliding Window strategy.

I get garbled output or poor grammar — how can I fix it?

Check prompt format. Many models expect instruction-completion framing like:
You are a helpful assistant.
Q: What is Kubernetes? A: Kubernetes is a container orchestration platform...
Tune few-shot examples or test structured prompts that trigger better behavior.

Snehasish Konger

Developed @scientyficworld.org | Technical writer @Nected | Content Developer

Deploy Local LLMs with Ollama and n8n for Private Workflows

Prerequisites

1. System Requirements

2. Ollama Installation

3. n8n Installation

4. Node.js + curl (Optional)

5. Model Selection

Step-By-Step Implementation

Step 1: Run a Model with Ollama

Step 2: Start Ollama’s REST API Server

Step 3: Build the n8n Workflow

3.1: Add Trigger

3.2: Add a “Set” Node to define a prompt

3.3: Add HTTP Request Node for Ollama

3.4: Extract Result

Step 4: Expand the Workflow

Testing & Output

Case 1: Happy Path

Case 2: Malformed Prompt

Case 3: Ollama Not Reachable

Validation Tips

Advanced Configuration

Model Switching at Runtime

Streaming Responses

Authentication for Ollama

Scalability and Load

Monitoring & Debugging

Conclusion

FAQs

Can I use Ollama over HTTPS?

How do I run multiple concurrent requests with limited RAM?

Which models are best for summarization or classification?

How can I integrate PDFs or files into the prompt?

I get garbled output or poor grammar — how can I fix it?

On This page

Take a Pause with Intervals

A Sunday letter on building, writing, and thinking deeper as a developer — short, honest, and worth your time.

Related Posts

Deploy Local LLMs with Ollama and n8n for Private Workflows

Prerequisites

1. System Requirements

2. Ollama Installation

3. n8n Installation

4. Node.js + curl (Optional)

5. Model Selection

Step-By-Step Implementation

Step 1: Run a Model with Ollama

Step 2: Start Ollama’s REST API Server

Step 3: Build the n8n Workflow

3.1: Add Trigger

3.2: Add a “Set” Node to define a prompt

3.3: Add HTTP Request Node for Ollama

3.4: Extract Result

Step 4: Expand the Workflow

Testing & Output

Case 1: Happy Path

Case 2: Malformed Prompt

Case 3: Ollama Not Reachable

Validation Tips

Advanced Configuration

Model Switching at Runtime

Streaming Responses

Authentication for Ollama

Scalability and Load

Monitoring & Debugging

Conclusion

FAQs

Can I use Ollama over HTTPS?

How do I run multiple concurrent requests with limited RAM?

Which models are best for summarization or classification?

How can I integrate PDFs or files into the prompt?

I get garbled output or poor grammar — how can I fix it?

On This page

Take a Pause with Intervals

A Sunday letter on building, writing, and thinking deeper as a developer — short, honest, and worth your time.

Related Posts

How to Use the Diátaxis Framework for Developer Docs

How to Write Error Messages That Help Developers Debug Faster

How I Built an Automated Local Business Lead Finder with n8n and SearchAPI

How to Use Terraform to Provision an AWS EC2 Instance

How to Write OpenAPI 3.1 Specs That Pass Validation

Claude Code vs Cursor: My Honest Take