Mjara Docs
Architecture

Data Flow

Document ingestion and query processing data flows

Data Flow

This page illustrates how data flows through the RAG system during the two primary operations: document ingestion and query processing.

Document Ingestion Flow

When a document is ingested (via URL scraping or file upload), it passes through the following pipeline:

Step-by-Step Breakdown

StepComponentWhat Happens
1. InputWebScraper / FileUploaderContent is fetched from URL or received as file upload
2. ParseDocumentParserDocling extracts text, tables, and images from the document
3. Dedup CheckDeduplicatorMD5 hash of content checked against existing documents
4. CleanTextCleanerHTML/JS/CSS removed, encoding fixed, whitespace normalized
5. Arabic FixArabicTextFixerNFKC normalization for Arabic presentation forms
6. ChunkTextChunkerText split into ~500 word chunks with 50-word overlap
7. DeduplicateDeduplicatorNear-duplicate chunks removed (Jaccard similarity)
8. LanguageLanguageDetectorLanguage detected and added to chunk metadata
9. EmbedEmbedderVector embedding generated via Ollama bge-m3
10. StoreDocumentManagerStored in ChromaDB (vectors) and PostgreSQL (metadata)

Query Processing Flow

When a user submits a query, the system retrieves relevant context and generates an answer:

Step-by-Step Breakdown

StepComponentWhat HappensTypical Time
1. EmbedEmbedderQuery text → vector embedding via Ollama~50ms
2. Vector SearchChromaDBCosine similarity search, top 20 candidates~120ms
3. BM25 SearchBM25 IndexTF-IDF keyword scoring, top 20 candidates~10ms
4. RRF FusionSemanticRetrieverMerge vector + BM25 results with RRF~1ms
5. RerankTEI RerankerCross-encoder scores all candidates~2.5s
6. FormatRAGSystemTop 5 documents formatted as LLM context~1ms
7. GenerateOllama LLMAnswer generated from context + question~10s
Total~12.7s

Query Time Distribution

Async Execution Model

The query pipeline leverages async execution for maximum performance:

Key performance optimizations:

  • Parallel embedding + BM25asyncio.gather() runs these concurrently
  • Thread pool offloading — CPU-bound ChromaDB and BM25 operations use thread pool
  • Async HTTP — all external calls (Ollama, TEI) use httpx.AsyncClient
  • Batch embedding — semaphore-controlled concurrency for batch operations

On this page