Mjara Docs

RAG System Overview

A production-grade Retrieval-Augmented Generation system for document processing, semantic search, and AI-powered Q&A

RAG System

A production-grade Retrieval-Augmented Generation (RAG) system built with Python and FastAPI. It ingests documents from multiple sources, processes and embeds them into a vector store, and provides semantic search with LLM-powered question answering.

What It Does

The RAG system enables you to:

  • Ingest documents from URLs (web scraping) or file uploads (PDF, DOCX, PPTX, HTML, images, CSV, XLSX)
  • Process and chunk text with intelligent splitting, deduplication, and language detection
  • Store embeddings in ChromaDB with metadata in PostgreSQL
  • Search semantically using hybrid vector + keyword search with cross-encoder reranking
  • Generate answers using an LLM grounded in your document knowledge base

Key Capabilities

CapabilityDetails
Document FormatsPDF, DOCX, PPTX, HTML, Markdown, CSV, XLSX, PNG, JPG, TIFF
OCR SupportTesseract + RapidOCR for scanned documents and images
Languages50+ languages with specialized Arabic/RTL support
SearchHybrid search (vector + BM25) with RRF fusion
RerankingCross-encoder reranking via HuggingFace TEI
LLMOllama integration (Gemma3, LLaMA, etc.)
StorageChromaDB (vectors) + PostgreSQL (metadata)
APIFastAPI REST API with OpenAPI/Swagger docs
AsyncFull async pipeline with background task support

Technology Stack

Architecture at a Glance

The system follows a layered architecture:

  1. API Layer — FastAPI server with authentication, routing, and request validation
  2. Core Layer — RAGSystem orchestrator, DocumentManager, and SemanticRetriever
  3. Service Layer — Embedder, Reranker, LLM client, and BM25 index
  4. Storage Layer — ChromaDB vector store and PostgreSQL metadata database
  5. External Services — Ollama (embeddings + LLM) and HuggingFace TEI (reranking)

On this page