Data Flow

This page illustrates how data flows through the RAG system during the two primary operations: document ingestion and query processing.

Document Ingestion Flow

When a document is ingested (via URL scraping or file upload), it passes through the following pipeline:

Step	Component	What Happens
1. Input	WebScraper / FileUploader	Content is fetched from URL or received as file upload
2. Parse	DocumentParser	Docling extracts text, tables, and images from the document
3. Dedup Check	Deduplicator	MD5 hash of content checked against existing documents
4. Clean	TextCleaner	HTML/JS/CSS removed, encoding fixed, whitespace normalized
5. Arabic Fix	ArabicTextFixer	NFKC normalization for Arabic presentation forms
6. Chunk	TextChunker	Text split into ~500 word chunks with 50-word overlap
7. Deduplicate	Deduplicator	Near-duplicate chunks removed (Jaccard similarity)
8. Language	LanguageDetector	Language detected and added to chunk metadata
9. Embed	Embedder	Vector embedding generated via Ollama bge-m3
10. Store	DocumentManager	Stored in ChromaDB (vectors) and PostgreSQL (metadata)

When a user submits a query, the system retrieves relevant context and generates an answer:

Step	Component	What Happens	Typical Time
1. Embed	Embedder	Query text → vector embedding via Ollama	~50ms
2. Vector Search	ChromaDB	Cosine similarity search, top 20 candidates	~120ms
3. BM25 Search	BM25 Index	TF-IDF keyword scoring, top 20 candidates	~10ms
4. RRF Fusion	SemanticRetriever	Merge vector + BM25 results with RRF	~1ms
5. Rerank	TEI Reranker	Cross-encoder scores all candidates	~2.5s
6. Format	RAGSystem	Top 5 documents formatted as LLM context	~1ms
7. Generate	Ollama LLM	Answer generated from context + question	~10s
Total			~12.7s

The query pipeline leverages async execution for maximum performance:

Key performance optimizations:

Parallel embedding + BM25 — asyncio.gather() runs these concurrently
Thread pool offloading — CPU-bound ChromaDB and BM25 operations use thread pool
Async HTTP — all external calls (Ollama, TEI) use httpx.AsyncClient
Batch embedding — semaphore-controlled concurrency for batch operations