Getting Started
Installation & Setup
Get the RAG system up and running on your local machine
Installation & Setup
Prerequisites
- Python 3.10 or higher
- Ollama — for embeddings and LLM generation
- PostgreSQL — for metadata storage (or Supabase)
- Redis — optional, for Celery distributed task processing
System Dependencies
For document parsing (PDF, images, OCR), install these system packages:
# Ubuntu/Debian
sudo apt-get install -y \
tesseract-ocr tesseract-ocr-ara tesseract-ocr-eng \
poppler-utils libmagic1 \
libgl1 libglib2.0-0 libsm6 libxext6 libxrender1Installation
# Clone or navigate to the project
cd rag
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtFor CPU-only systems, install PyTorch CPU version first:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpuQuick Start
1. Configure Environment
Copy the example environment file and configure your settings:
cp .env.example .envEdit .env with your specific values (see Configuration for details).
2. Start the API Server
# Development (with auto-reload)
uvicorn api.main:app --reload --host 0.0.0.0 --port 9000
# Production
uvicorn api.main:app --host 0.0.0.0 --port 9000 --workers 43. Verify the Setup
# Health check
curl http://localhost:9000/api/v1/health
# Readiness check (verifies all components)
curl http://localhost:9000/api/v1/health/ready4. Ingest Your First Document
# Scrape a URL
curl -X POST http://localhost:9000/api/v1/scrape \
-H "X-API-Key: your-secret-key-1" \
-H "Content-Type: application/json" \
-d '{"url": "https://docs.example.com/page"}'
# Or upload a file
curl -X POST http://localhost:9000/api/v1/upload \
-H "X-API-Key: your-secret-key-1" \
-F "file=@document.pdf" \
-F "section=manuals"5. Query the System
curl -X POST http://localhost:9000/api/v1/query \
-H "X-API-Key: your-secret-key-1" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create an invoice?"}'Using the Python SDK
You can also use the RAG system directly from Python:
from src import RAGPreprocessor, RAGSystem
# 1. Preprocess documents
preprocessor = RAGPreprocessor()
chunks = preprocessor.process_documents([
{"text": "Your document content...", "source": "manual", "id": "doc-001"},
])
# 2. Create RAG system and add documents
rag = RAGSystem(vector_store_path="./data/chroma_db")
result = rag.add_chunks(chunks)
print(f"Added: {result.added}, Skipped: {result.skipped}")
# 3. Query
response = rag.query("What is this document about?")
print(response.answer)API Documentation UI
Once the server is running, interactive API documentation is available at:
- Swagger UI: http://localhost:9000/docs
- ReDoc: http://localhost:9000/redoc
- OpenAPI JSON: http://localhost:9000/openapi.json