Getting Started
Configuration
Environment variables and settings for the RAG system
The RAG system uses Pydantic-based settings loaded from environment variables. All configuration is managed through the .env file.
| Variable | Default | Description |
|---|
EMBEDDING_MODEL | nextfire/paraphrase-multilingual-minilm:l12-v2 | Ollama embedding model name |
EMBEDDING_BASE_URL | http://localhost:11434 | Ollama API base URL |
EMBEDDING_TIMEOUT | 60 | Request timeout in seconds |
EMBEDDING_NORMALIZE | true | Normalize embedding vectors |
LOCAL_EMBED_MODEL | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | Local sentence-transformers model (optional fallback) |
LOCAL_EMBED_DEVICE | (auto-detect) | Device for local model (cpu, cuda, mps) |
LOCAL_EMBED_NORMALIZE | true | Normalize local embeddings |
| Variable | Default | Description |
|---|
LLM_MODEL | gemma3:latest | Ollama LLM model name |
LLM_BASE_URL | https://ollama.dragonteam.dev | Ollama API base URL |
LLM_TEMPERATURE | 0.7 | Generation temperature (0.0-2.0) |
LLM_TIMEOUT | 120 | Request timeout in seconds |
| Variable | Default | Description |
|---|
RERANKER_ENABLED | true | Enable/disable cross-encoder reranking |
RERANKER_BASE_URL | http://localhost:8090 | HuggingFace TEI server URL |
RERANKER_MODEL | BAAI/bge-reranker-v2-m3 | Reranker model name |
RERANKER_TIMEOUT | 30 | Request timeout in seconds |
RERANKER_INITIAL_K | 20 | Number of candidates to retrieve before reranking |
| Variable | Default | Description |
|---|
VECTOR_BACKEND | chroma | Vector store backend |
VECTOR_PERSIST_PATH | ./data/chroma_db | Local ChromaDB storage path |
VECTOR_COLLECTION_NAME | rag_documents | ChromaDB collection name |
| Variable | Default | Description |
|---|
RAG_TOP_K | 5 | Number of documents to use as context |
RAG_INCLUDE_TIMING | true | Include timing breakdown in responses |
RAG_USE_HYBRID | true | Enable hybrid search (vector + BM25) |
| Variable | Default | Description |
|---|
API_KEYS | (empty) | Comma-separated API keys. Empty = auth disabled |
API_HOST | 0.0.0.0 | Host to bind to |
API_PORT | 9000 | Port to listen on |
API_CORS_ORIGINS | * | Allowed CORS origins (comma-separated) |
API_DEBUG | false | Enable debug mode with auto-reload |
| Variable | Default | Description |
|---|
CHUNK_TARGET_WORDS | 500 | Target words per chunk |
CHUNK_MIN_WORDS | 100 | Minimum words per chunk |
CHUNK_MAX_WORDS | 600 | Maximum words per chunk |
CHUNK_OVERLAP_WORDS | 50 | Overlap words between chunks |
| Variable | Default | Description |
|---|
DB_URL | (required if enabled) | PostgreSQL connection string |
DB_ENABLED | true | Enable/disable PostgreSQL metadata storage |
DB_LOG_DUPLICATES | false | Log duplicate detection events |
| Variable | Default | Description |
|---|
VLM_ENABLED | true | Enable image/chart description |
VLM_MODEL | gemma3:4b | Vision model name |
VLM_BASE_URL | http://localhost:11434 | Ollama API base URL |
VLM_TIMEOUT | 120 | Request timeout in seconds |
VLM_PROMPT | (built-in) | Prompt for image description |
| Variable | Default | Description |
|---|
REDIS_HOST | localhost | Redis server host |
REDIS_PORT | 6379 | Redis server port |
REDIS_BROKER_DB | 0 | Redis DB for Celery broker |
REDIS_BACKEND_DB | 1 | Redis DB for Celery backend |
# Embedding (Ollama API)
EMBEDDING_MODEL=bge-m3:latest
EMBEDDING_BASE_URL=https://your-ollama-server.example.com
EMBEDDING_TIMEOUT=60
EMBEDDING_NORMALIZE=true
# Reranker (HuggingFace TEI)
RERANKER_ENABLED=true
RERANKER_BASE_URL=http://your-reranker-server:8787
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
RERANKER_TIMEOUT=30
RERANKER_INITIAL_K=20
# LLM (Ollama API)
LLM_MODEL=gemma3:latest
LLM_BASE_URL=https://your-ollama-server.example.com
LLM_TEMPERATURE=0.7
LLM_TIMEOUT=120
# Vector Store
VECTOR_BACKEND=chroma
VECTOR_PERSIST_PATH=./data/chroma_db
VECTOR_COLLECTION_NAME=rag_documents
# RAG
RAG_TOP_K=5
RAG_INCLUDE_TIMING=true
RAG_USE_HYBRID=true
# API
API_KEYS=your-secret-key-1,your-secret-key-2
API_HOST=0.0.0.0
API_PORT=9000
API_CORS_ORIGINS=*
# Text Chunking
CHUNK_TARGET_WORDS=500
CHUNK_MIN_WORDS=100
CHUNK_MAX_WORDS=600
CHUNK_OVERLAP_WORDS=50
# PostgreSQL
DB_URL=postgresql://user:password@host:port/database
DB_ENABLED=true
# VLM (Vision Language Model)
VLM_ENABLED=true
VLM_MODEL=gemma3:4b
VLM_BASE_URL=https://your-ollama-server.example.com
# Redis (for Celery - optional)
REDIS_HOST=localhost
REDIS_PORT=6379
from config import get_settings
settings = get_settings()
# Access nested settings
print(settings.embedding.model)
print(settings.reranker.model)
print(settings.llm.model)
print(settings.database.url)
The get_settings() function returns a cached singleton, so it can be called multiple times without performance overhead.