Deployment
Deployment Overview
Deployment options and production setup for the RAG system
Deployment Overview
The RAG system supports multiple deployment configurations, from local development to production Docker containers.
Deployment Architecture
Deployment Options
Option 1 — Local Development
Run the API server directly with Python:
# Activate virtual environment
source .venv/bin/activate
# Start the server
uvicorn api.main:app --reload --host 0.0.0.0 --port 9000Requirements:
- Python 3.10+ with dependencies installed
- Ollama running locally or remotely
- PostgreSQL accessible
Option 2 — Docker (Recommended for Production)
Deploy with Docker Compose:
docker compose up -d apiThe container connects to external services (Ollama, TEI, PostgreSQL) running on the host via host.docker.internal.
See Docker Deployment for full configuration.
Option 3 — Kubernetes
Kubernetes manifests are available in .kubernates/ for orchestrated deployment.
Production Checklist
| Item | Description |
|---|---|
| API Keys | Set strong API_KEYS in .env |
| CORS | Restrict API_CORS_ORIGINS to your domain |
| Database | Use a managed PostgreSQL/Supabase instance |
| Ollama | Deploy on a GPU server for best performance |
| Monitoring | Enable health checks at /api/v1/health/ready |
| Logging | Logs are written to logs/ directory |
| Backups | Back up ChromaDB data/chroma_db/ and PostgreSQL |
| Workers | Use multiple Uvicorn workers (--workers 2 default in Docker) |