Query Endpoints

Base path: /api/v1

POST `/api/v1/query`

Query the RAG system with a question. Returns an AI-generated answer based on relevant documents.

Request Body:

{
  "question": "How do I create an invoice?",
  "top_k": 5,
  "initial_k": 20,
  "include_timing": true,
  "include_context": false,
  "temperature": 0.7
}

Field	Type	Default	Description
`question`	string	required	The question to ask
`top_k`	int	5	Number of documents to use as context (1-20)
`initial_k`	int	20	Documents to retrieve before reranking (1-100)
`include_timing`	bool	true	Include timing breakdown in response
`include_context`	bool	false	Include raw context text in response
`temperature`	float	null	LLM temperature override (0.0-2.0)

Response:

{
  "question": "How do I create an invoice?",
  "answer": "To create an invoice, follow these steps...",
  "sources": [
    {
      "title": "Invoice Guide",
      "source": "https://docs.example.com/invoices",
      "section": "documentation",
      "score": 0.95,
      "doc_id": "abc123"
    }
  ],
  "timing": {
    "embedding": 0.05,
    "vector_search": 0.12,
    "reranking": 2.5,
    "retrieval_total": 2.67,
    "context_formatting": 0.001,
    "llm_generation": 10.0,
    "total": 12.67
  },
  "context_used": null
}

Timing Fields

Field	Description
`embedding`	Time to generate query embedding (seconds)
`vector_search`	ChromaDB vector similarity search time (seconds)
`reranking`	Cross-encoder reranking time (seconds). 0 if disabled.
`retrieval_total`	Total retrieval time (embedding + search + reranking)
`context_formatting`	Time to format documents for LLM (seconds)
`llm_generation`	LLM answer generation time (seconds)
`total`	Total end-to-end time (seconds)

Example:

curl -X POST http://localhost:9000/api/v1/query \
  -H "X-API-Key: your-key" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create an invoice?"}'

POST `/api/v1/search`

Semantic search without LLM generation. Returns relevant documents ranked by similarity.

Request Body:

{
  "query": "invoice creation process",
  "top_k": 10,
  "initial_k": 50
}

Field	Type	Default	Description
`query`	string	required	Search query
`top_k`	int	5	Number of results to return (1-50)
`initial_k`	int	20	Results to retrieve before reranking (1-100)

Response:

{
  "results": [
    {
      "text": "Document content here...",
      "source": "https://docs.example.com/invoices",
      "metadata": {
        "title": "Invoice Guide",
        "section": "documentation",
        "language": "en"
      },
      "score": 0.95,
      "doc_id": "abc123"
    }
  ],
  "count": 10
}

Query Endpoints

Query Endpoints

POST /api/v1/query

Timing Fields

POST /api/v1/search

On this page

POST `/api/v1/query`

POST `/api/v1/search`