Configuration

The RAG system uses Pydantic-based settings loaded from environment variables. All configuration is managed through the .env file.

Environment Variables

Embedding Configuration

Variable	Default	Description
`EMBEDDING_MODEL`	`nextfire/paraphrase-multilingual-minilm:l12-v2`	Ollama embedding model name
`EMBEDDING_BASE_URL`	`http://localhost:11434`	Ollama API base URL
`EMBEDDING_TIMEOUT`	`60`	Request timeout in seconds
`EMBEDDING_NORMALIZE`	`true`	Normalize embedding vectors
`LOCAL_EMBED_MODEL`	`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`	Local sentence-transformers model (optional fallback)
`LOCAL_EMBED_DEVICE`	(auto-detect)	Device for local model (`cpu`, `cuda`, `mps`)
`LOCAL_EMBED_NORMALIZE`	`true`	Normalize local embeddings

LLM Configuration

Variable	Default	Description
`LLM_MODEL`	`gemma3:latest`	Ollama LLM model name
`LLM_BASE_URL`	`https://ollama.dragonteam.dev`	Ollama API base URL
`LLM_TEMPERATURE`	`0.7`	Generation temperature (0.0-2.0)
`LLM_TIMEOUT`	`120`	Request timeout in seconds

Reranker Configuration

Variable	Default	Description
`RERANKER_ENABLED`	`true`	Enable/disable cross-encoder reranking
`RERANKER_BASE_URL`	`http://localhost:8090`	HuggingFace TEI server URL
`RERANKER_MODEL`	`BAAI/bge-reranker-v2-m3`	Reranker model name
`RERANKER_TIMEOUT`	`30`	Request timeout in seconds
`RERANKER_INITIAL_K`	`20`	Number of candidates to retrieve before reranking

Vector Store Configuration

Variable	Default	Description
`VECTOR_BACKEND`	`chroma`	Vector store backend
`VECTOR_PERSIST_PATH`	`./data/chroma_db`	Local ChromaDB storage path
`VECTOR_COLLECTION_NAME`	`rag_documents`	ChromaDB collection name

RAG Configuration

Variable	Default	Description
`RAG_TOP_K`	`5`	Number of documents to use as context
`RAG_INCLUDE_TIMING`	`true`	Include timing breakdown in responses
`RAG_USE_HYBRID`	`true`	Enable hybrid search (vector + BM25)

API Configuration

Variable	Default	Description
`API_KEYS`	(empty)	Comma-separated API keys. Empty = auth disabled
`API_HOST`	`0.0.0.0`	Host to bind to
`API_PORT`	`9000`	Port to listen on
`API_CORS_ORIGINS`	`*`	Allowed CORS origins (comma-separated)
`API_DEBUG`	`false`	Enable debug mode with auto-reload

Text Chunking Configuration

Variable	Default	Description
`CHUNK_TARGET_WORDS`	`500`	Target words per chunk
`CHUNK_MIN_WORDS`	`100`	Minimum words per chunk
`CHUNK_MAX_WORDS`	`600`	Maximum words per chunk
`CHUNK_OVERLAP_WORDS`	`50`	Overlap words between chunks

Database Configuration

Variable	Default	Description
`DB_URL`	(required if enabled)	PostgreSQL connection string
`DB_ENABLED`	`true`	Enable/disable PostgreSQL metadata storage
`DB_LOG_DUPLICATES`	`false`	Log duplicate detection events

VLM Configuration (Vision Language Model)

Variable	Default	Description
`VLM_ENABLED`	`true`	Enable image/chart description
`VLM_MODEL`	`gemma3:4b`	Vision model name
`VLM_BASE_URL`	`http://localhost:11434`	Ollama API base URL
`VLM_TIMEOUT`	`120`	Request timeout in seconds
`VLM_PROMPT`	(built-in)	Prompt for image description

Redis Configuration (Optional)

Variable	Default	Description
`REDIS_HOST`	`localhost`	Redis server host
`REDIS_PORT`	`6379`	Redis server port
`REDIS_BROKER_DB`	`0`	Redis DB for Celery broker
`REDIS_BACKEND_DB`	`1`	Redis DB for Celery backend

Example .env File

# Embedding (Ollama API)
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_BASE_URL=https://your-ollama-server.example.com
EMBEDDING_TIMEOUT=60
EMBEDDING_NORMALIZE=true

# Reranker (HuggingFace TEI)
RERANKER_ENABLED=true
RERANKER_BASE_URL=http://your-reranker-server:8787
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
RERANKER_TIMEOUT=30
RERANKER_INITIAL_K=20

# LLM (Ollama API)
LLM_MODEL=gemma3:latest
LLM_BASE_URL=https://your-ollama-server.example.com
LLM_TEMPERATURE=0.7
LLM_TIMEOUT=120

# Vector Store
VECTOR_BACKEND=chroma
VECTOR_PERSIST_PATH=./data/chroma_db
VECTOR_COLLECTION_NAME=rag_documents

# RAG
RAG_TOP_K=5
RAG_INCLUDE_TIMING=true
RAG_USE_HYBRID=true

# API
API_KEYS=your-secret-key-1,your-secret-key-2
API_HOST=0.0.0.0
API_PORT=9000
API_CORS_ORIGINS=*

# Text Chunking
CHUNK_TARGET_WORDS=500
CHUNK_MIN_WORDS=100
CHUNK_MAX_WORDS=600
CHUNK_OVERLAP_WORDS=50

# PostgreSQL
DB_URL=postgresql://user:password@host:port/database
DB_ENABLED=true

# VLM (Vision Language Model)
VLM_ENABLED=true
VLM_MODEL=gemma3:4b
VLM_BASE_URL=https://your-ollama-server.example.com

# Redis (for Celery - optional)
REDIS_HOST=localhost
REDIS_PORT=6379

Accessing Settings in Code

from config import get_settings

settings = get_settings()

# Access nested settings
print(settings.embedding.model)
print(settings.reranker.model)
print(settings.llm.model)
print(settings.database.url)

The get_settings() function returns a cached singleton, so it can be called multiple times without performance overhead.

Configuration

On this page