Building intelligent document-based question-answering systems typically requires extensive coding knowledge and complex deployment pipelines. N8n changes this by providing a visual workflow automation platform that lets you create sophisticated RAG (Retrieval-Augmented Generation) systems without writing code.
RAG combines information retrieval with text generation. Instead of relying only on pre-trained knowledge, RAG systems search through your documents to find relevant information, then use that context to generate accurate answers. This approach significantly reduces hallucinations and provides responses grounded in your actual data.
This guide shows you how to build a production-ready RAG knowledge assistant using N8n’s visual interface. Your system will ingest documents, create searchable embeddings, and provide accurate, source-attributed responses to questions.
The workflow processes documents into chunks, converts them to vector embeddings, stores them in a vector database, and retrieves relevant chunks when answering user questions. All of this happens through N8n’s drag-and-drop interface.
We’ve already created a detailed guide for building this RAG-based assistant using the hard-coded Python method. If you prefer working directly with scripts and full control over the code, you can follow that tutorial here:
👉 Building a RAG with Python
Project Setup and Requirements
Before building, ensure you have these components ready:
- Required Services:
- N8n instance (assume you already have this running)
- OpenAI API key for embeddings and completions
- Pinecone account for vector storage
- Google Drive for document storage (optional)
- Required N8n Credentials: Configure these in N8n Settings > Credentials:
- OpenAI Credentials:
- API Key: Your OpenAI API key
- Organization: Leave blank unless you have a specific org
Pinecone Credentials:
- API Key: Your Pinecone API key
- Environment: Your Pinecone environment (e.g.,
us-east-1-aws
)
Pinecone Index Setup: Create a new index in your Pinecone dashboard:
- Index Name:
knowledge-assistant
- Dimensions:
1536
(for OpenAI embeddings) - Metric:
cosine
- Pod Type:
p1.x1
(starter)
Document Preparation:
- Supported formats: PDF, TXT, DOCX
- Recommended file size: Under 10MB per document
- File naming: Use descriptive names (they’ll appear in citations)
Step 1: Create Document Processing Workflow
Create a new workflow named “Document Processor” that converts documents into searchable vectors.
Node 1: Webhook Trigger Add a Webhook node to receive document uploads:
{
"httpMethod": "POST",
"path": "upload-document",
"responseMode": "responseNode"
}
Node 2: Extract File Content Add a Code node to process uploaded files:
const items = $input.all();
const results = [];
for (const item of items) {
// Get file from binary data
const binaryData = item.binary.data;
const fileName = binaryData.fileName || 'unknown.txt';
const content = Buffer.from(binaryData.data, 'base64').toString('utf-8');
results.push({
json: {
fileName: fileName,
content: content,
fileSize: content.length
}
});
}
return results;
JavaScriptNode 3: Text Chunking Add another Code node to split documents into chunks:
const items = $input.all();
const chunks = [];
for (const item of items) {
const content = item.json.content;
const fileName = item.json.fileName;
const chunkSize = 1000;
const overlap = 200;
// Split text into overlapping chunks
for (let i = 0; i < content.length; i += chunkSize - overlap) {
const chunk = content.slice(i, i + chunkSize);
if (chunk.trim().length > 50) {
chunks.push({
json: {
id: `${fileName.replace(/\s+/g, '_')}_chunk_${Math.floor(i / (chunkSize - overlap))}`,
content: chunk.trim(),
source: fileName,
chunkIndex: Math.floor(i / (chunkSize - overlap))
}
});
}
}
}
return chunks;
JavaScriptNode 4: Generate Embeddings Add an HTTP Request node to create embeddings:
{
"method": "POST",
"url": "https://api.openai.com/v1/embeddings",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "openAiApi",
"headers": {},
"body": {
"input": "={{ $json.content }}",
"model": "text-embedding-ada-002"
}
}
JSONNode 5: Store in Pinecone Add another HTTP Request node for vector storage:
{
"method": "POST",
"url": "https://knowledge-assistant-XXXXXXX.svc.us-east-1-aws.pinecone.io/vectors/upsert",
"authentication": "headerAuth",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{
"name": "Api-Key",
"value": "={{ $credentials.pineconeApi.apiKey }}"
}
]
},
"body": {
"vectors": [
{
"id": "={{ $json.id }}",
"values": "={{ $json.data[0].embedding }}",
"metadata": {
"content": "={{ $json.content }}",
"source": "={{ $json.source }}",
"chunk_index": "={{ $json.chunkIndex }}"
}
}
]
}
}
JSONNode 6: Response Add a Respond to Webhook node:
{
"statusCode": 200,
"body": {
"message": "Document processed successfully",
"chunks_created": "{{ $runIndex + 1 }}",
"source": "{{ $('Text Chunking').first().json.source }}"
}
}
JSONConnect the nodes in sequence: Webhook → Extract File → Text Chunking → Generate Embeddings → Store in Pinecone → Response.
Step 2: Create Query Processing Workflow
Create a second workflow named “Question Answerer” for handling user queries.
Node 1: Webhook Trigger Add a Webhook node for questions:
{
"httpMethod": "POST",
"path": "ask-question",
"responseMode": "responseNode"
}
Node 2: Input Validation Add a Code node to validate input:
const question = $input.first().json.question;
if (!question || question.trim().length === 0) {
throw new Error('Question is required');
}
if (question.length > 500) {
throw new Error('Question too long (max 500 characters)');
}
return [{
json: {
question: question.trim(),
timestamp: new Date().toISOString()
}
}];
JavaScriptNode 3: Generate Query Embedding Add an HTTP Request node for query embedding:
{
"method": "POST",
"url": "https://api.openai.com/v1/embeddings",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "openAiApi",
"body": {
"input": "={{ $json.question }}",
"model": "text-embedding-ada-002"
}
}
Node 4: Search Similar Chunks Add an HTTP Request node to query Pinecone:
{
"method": "POST",
"url": "https://knowledge-assistant-XXXXXXX.svc.us-east-1-aws.pinecone.io/query",
"authentication": "headerAuth",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{
"name": "Api-Key",
"value": "={{ $credentials.pineconeApi.apiKey }}"
}
]
},
"body": {
"vector": "={{ $json.data[0].embedding }}",
"topK": 5,
"includeMetadata": true,
"includeValues": false
}
}
Node 5: Build Context Add a Code node to format retrieved chunks:
const results = $input.first().json.matches;
const question = $('Input Validation').first().json.question;
const relevanceThreshold = 0.7;
// Filter relevant documents
const relevantDocs = results
.filter(match => match.score > relevanceThreshold)
.map(match => ({
content: match.metadata.content,
source: match.metadata.source,
score: Math.round(match.score * 100)
}));
if (relevantDocs.length === 0) {
return [{
json: {
context: "No relevant documents found for this question.",
question: question,
sources: [],
hasContext: false
}
}];
}
// Build context string
const context = relevantDocs
.map((doc, index) => `Document ${index + 1} (${doc.source}):\n${doc.content}`)
.join('\n\n---\n\n');
return [{
json: {
context: context,
question: question,
sources: relevantDocs.map(doc => ({
source: doc.source,
relevance: doc.score
})),
hasContext: true
}
}];
JavaScriptNode 6: Generate Answer Add an HTTP Request node for LLM completion:
{
"method": "POST",
"url": "https://api.openai.com/v1/chat/completions",
"authentication": "predefinedCredentialType",
"nodeCredentialType": "openAiApi",
"body": {
"model": "gpt-4",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that answers questions based on provided context. Always cite sources and be specific about which documents support your answer. If the context doesn't contain enough information, clearly state this limitation."
},
{
"role": "user",
"content": "Context:\n{{ $json.context }}\n\nQuestion: {{ $json.question }}\n\nAnswer:"
}
],
"temperature": 0.3,
"max_tokens": 1000
}
}
JSONNode 7: Format Response Add a Code node to format the final response:
const chatResponse = $input.first().json.choices[0].message.content;
const contextData = $('Build Context').first().json;
return [{
json: {
answer: chatResponse,
sources: contextData.sources,
hasRelevantContext: contextData.hasContext,
timestamp: new Date().toISOString(),
questionId: Math.random().toString(36).substr(2, 9)
}
}];
JavaScriptNode 8: Response Add a Respond to Webhook node:
{
"statusCode": 200,
"body": "={{ $json }}"
}
Connect the nodes: Webhook → Input Validation → Generate Query Embedding → Search Similar Chunks → Build Context → Generate Answer → Format Response → Response.
Step 3: Add Error Handling
Both workflows need error handling to manage API failures and invalid inputs.
Error Trigger Workflow: Create a third workflow named “Error Handler”:
Node 1: Error Trigger Add an Error Trigger node:
- Trigger On: Workflow Error
- Workflow: Select both Document Processor and Question Answerer
Node 2: Log Error Add a Code node to log errors:
const error = $input.first().json;
console.log('Workflow Error:', {
workflow: error.workflowName,
error: error.error.message,
timestamp: new Date().toISOString(),
executionId: error.executionId
});
return [{
json: {
errorLogged: true,
timestamp: new Date().toISOString(),
errorMessage: error.error.message
}
}];
JavaScriptRate Limiting: Add rate limiting to your question workflow by inserting a Code node after the webhook:
const rateLimit = {
maxRequests: 20,
timeWindow: 60000 // 1 minute
};
const clientId = $input.first().json.headers['x-forwarded-for'] || 'default';
const now = Date.now();
// Simple in-memory rate limiting (for production, use Redis)
if (!global.rateLimits) global.rateLimits = {};
if (!global.rateLimits[clientId]) global.rateLimits[clientId] = [];
// Clean old requests
global.rateLimits[clientId] = global.rateLimits[clientId]
.filter(timestamp => now - timestamp < rateLimit.timeWindow);
if (global.rateLimits[clientId].length >= rateLimit.maxRequests) {
throw new Error('Rate limit exceeded. Please try again later.');
}
global.rateLimits[clientId].push(now);
return $input.all();
JavaScriptTesting the System
Test your RAG system with these verification steps:
Test Document Upload:
curl -X POST "http://localhost:5678/webhook/upload-document" \
-F "[email protected]" \
-H "Content-Type: multipart/form-data"
Expected response:
{
"message": "Document processed successfully",
"chunks_created": 15,
"source": "test-document.txt"
}
Test Question Answering:
curl -X POST "http://localhost:5678/webhook/ask-question" \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic discussed in the uploaded documents?"}'
Expected response:
{
"answer": "Based on the provided context from test-document.txt...",
"sources": [
{
"source": "test-document.txt",
"relevance": 89
}
],
"hasRelevantContext": true,
"timestamp": "2024-01-15T10:30:00Z",
"questionId": "abc123def"
}
JSONPerformance Benchmarks:
- Document processing: 2-5 seconds per page
- Query response time: 3-7 seconds
- Embedding generation: ~1 second per chunk
- Vector search: <500ms
Verify Vector Storage: Check your Pinecone dashboard to confirm:
- Vectors are being created and stored
- Vector count matches expected chunks
- Metadata is properly attached
Conclusion
You now have a fully functional RAG knowledge base assistant built entirely with N8n’s visual workflow system. This implementation provides several key advantages:
- No Code Required: The entire system runs through visual workflows without writing traditional application code.
- Production Ready: Built-in error handling, rate limiting, and monitoring make this suitable for real-world use.
- Easily Extensible: Add new features like conversation memory, multiple knowledge bases, or different embedding models by adding new nodes.
- Cost Effective: Pay only for actual API usage with no infrastructure overhead.
Your RAG assistant can now process documents, understand questions in natural language, and provide accurate answers with source citations. The system scales with your document volume and integrates seamlessly with N8n’s extensive connector ecosystem.
The combination of OpenAI’s embeddings, Pinecone’s vector search, and N8n’s workflow orchestration gives you enterprise-grade RAG capabilities with significantly reduced complexity compared to traditional implementations.
FAQ
How many documents can this system handle?
Pinecone’s starter tier supports up to 100,000 vectors, which typically handles 200-500 documents depending on size. For larger knowledge bases, upgrade to Pinecone’s paid plans.
Can I use different LLM providers?
Yes, replace the OpenAI HTTP Request nodes with calls to other providers like Anthropic, Google, or local models. Adjust the request format accordingly.
How do I handle document updates?
Implement a versioning system by including timestamps in metadata. Re-upload modified documents to overwrite existing chunks, or delete old vectors before uploading new ones.
What about conversation history?
Add conversation memory by storing previous Q&A pairs and including them in your context. Use N8n’s database nodes or external storage for persistence.
Can I customize chunk sizes?
Modify the chunking logic in the Text Chunking node. Smaller chunks (500-700 characters) work better for specific questions, larger chunks (1000-1500) for comprehensive answers.
How do I improve answer accuracy?
Experiment with the relevance threshold (currently 0.7), increase topK for more context, or implement query expansion to find more relevant chunks.
What are the typical costs?
For moderate usage: OpenAI embeddings ~$0.10/1M tokens, GPT-4 completions ~$30/1M tokens, Pinecone starter ~$70/month. Total typically $50-200/month.