Solution 03 of 06 · Generative AI

LLMs grounded in your data, not the open web.

Retrieval-augmented chatbots, document intelligence, and domain-tuned LLM applications. 7 projects shipped - from PharmaBot's drug-information assistant to AiTutor's adaptive learning system to enterprise knowledge co-pilots, all grounded in proprietary content with citation and evaluation harnesses.

See GenAI case studies

GenAI systems
shipped

12k+

Documents
indexed (AiTutor)

LLM providers
integrated

What this is

Generative AI that doesn't hallucinate.

Most enterprise GenAI projects fail on hallucination, evaluation, and the gap between demo and production. AiSPRY's GenAI practice has been building grounded, evaluated, retrievable LLM systems since 2023 - built around retrieval, citation, and continuous evaluation rather than raw model capability.

RAG before fine-tuning

Retrieval-augmented generation solves 80% of enterprise problems and is debuggable. Fine-tuning is reserved for cases where retrieval can't carry the load - domain vocabulary, output format, or behavioural alignment.

Citations as a first-class output

Every answer points to source chunks. Users can verify, auditors can trace, and regulators can review. PharmaBot wouldn't ship without it; neither would any of our healthcare or compliance work.

Evaluation harnesses, not vibes

Custom evaluation sets per project - golden questions, edge cases, adversarial prompts - run in CI before any model or prompt change ships. We use Ragas, custom LLM-as-judge, and human review where it matters.

Provider-agnostic by design

Anthropic, OpenAI, Llama 3, and others all sit behind a routing layer. Cost, latency, and quality are tracked per provider. Migration between providers takes a config change, not a rewrite.

How we do it

Six GenAI capabilities.

Each block is a discipline we've shipped to production. AiTutor uses retrieval, evaluation, and routing; PharmaBot adds citation enforcement and structured outputs.

Capability 01

Retrieval-Augmented Generation

Production RAG pipelines with hybrid retrieval (BM25 + dense), reranking, and citation enforcement. The default architecture for 80% of enterprise GenAI problems we tackle.

LangChainLlamaIndexpgvectorPineconeBGE rerankers

Capability 02

Document Intelligence

PDF, image, and scanned document parsing with layout-aware models. OCR, table extraction, form understanding - feeding into RAG pipelines or structured output extraction.

LayoutLMv3DonutPaddleOCRUnstructured.io

Capability 03

Conversational Agents

Multi-turn chatbots with persistent memory, tool calling, and human handoff. PharmaBot and AiTutor's tutor mode both run on this stack.

OpenAI AssistantsLangGraphfunction callingsession memory

Capability 04

Domain Fine-Tuning

LoRA and QLoRA fine-tunes for output format, domain vocabulary, or behavioural alignment when retrieval alone isn't sufficient. Reserved for high-volume, well-defined tasks.

Llama 3MistralAxolotlUnslothLoRA/QLoRA

Capability 05

Evaluation & Observability

Per-project golden sets, LLM-as-judge harnesses, and continuous evaluation in CI. No prompt or model change ships without passing the evaluation gate.

RagasPhoenixLangSmithcustom evaluation suites

Capability 06

Multi-Provider Routing

Provider-agnostic abstraction so cost, latency, and quality can be tuned per workflow. Migration between Anthropic, OpenAI, and Llama 3 is a config change, not a rewrite.

LiteLLMOpenRoutercustom routing layers

Use cases

Where generative AI moves the needle.

A representative selection - not exhaustive. Most engagements involve adapting one of these patterns to a specific knowledge base.

EDTECH

Adaptive learning & AI tutoring

RAG-powered tutoring across 12,000+ course modules with personalised learning paths, automated assessment, and human-in-the-loop content review. AiTutor - HYSEA & Global award-winner.

360DigiTMG1 flagship system

PHARMA

Drug information & clinical assistants

Retrieval-augmented chatbots over drug formularies, dosing guidelines, interaction databases, and clinical literature with strict citation and uncertainty handling.

PharmaBot1 production system

BFSI

Document intelligence & knowledge co-pilots

Layout-aware extraction from contracts, policies, and statements, feeding into knowledge co-pilots for analysts. Strict access controls and audit logs.

BFSI clients2 projects shipped

INTERNAL TOOLING

Enterprise knowledge assistants

Cross-source RAG over Confluence, SharePoint, Slack, and ticketing systems. Used as a first-line internal helpdesk and onboarding companion.

Mid-size enterprises2 deployments

CONTENT

Structured content generation

Marketing collateral, product descriptions, and report generation with brand-voice constraints, factuality grounding, and human review gates.

Cross-industry1 project

CUSTOMER SERVICE

Tier-1 support automation

Conversational agents handling repetitive customer queries with seamless handoff to human agents on confidence drop. Integrated with ticketing systems.

Service desksPilot stage

Tech stack

The moving parts.

Models & Providers

Anthropic ClaudeDefault for long-context reasoning
OpenAI GPT-4o familyMulti-modal & tool calling
Llama 3 & MistralSelf-hosted, fine-tuneable
Cohere & Voyage embeddingsDomain-tuned retrieval
BGE rerankersHybrid retrieval reranking

Retrieval & Storage

pgvector & PostgreSQLDefault vector store, transactional
PineconeManaged vector DB at scale
Elasticsearch & OpenSearchHybrid BM25 + dense retrieval
LlamaIndex & LangChainIndexing & query orchestration
Unstructured.ioDocument parsing & chunking

Production & Ops

LangSmith & PhoenixTracing & observability
RagasRAG evaluation framework
LiteLLMMulti-provider routing
FastAPI & StreamlitApplication layer
Guardrails & NeMo GuardrailsOutput validation

Selected work

Three systems, all in production.

Explore all case studies

EdTech

AiTutor

EdTech · 360DigiTMG

Adaptive AI learning at scale

RAG-powered tutoring over 12,000+ course modules with personalised learning paths, automated assessment, and human-in-the-loop content review. HYSEA & Global award-winner 2024-25.

Pharma · Drug intelligence

Drug-information assistant with RAG

Retrieval-augmented chatbot over drug formularies, NPPA pricing, dosing guidelines, and interaction databases. Strict citation enforcement and uncertainty handling for clinical contexts.

Enterprise Knowledge Co-pilot

Internal · Mid-size enterprise

Cross-source RAG for internal teams

Cross-source RAG over Confluence, SharePoint, Slack, and ticketing systems. First-line internal helpdesk handling onboarding and policy queries with citation-first answers.

Frequently asked

Common questions about this capability.

Quick answers to what teams ask before bringing us in. Don't see your question? Talk to us directly.

Anthropic Claude, OpenAI GPT-4o family, and self-hosted Llama 3 / Mistral are the main three. Choice depends on the workload: long-context reasoning often goes to Claude, multi-modal and tool calling to GPT-4o, and self-hosted Llama 3 when data residency or cost demands it. Every project has a multi-provider routing layer so migration is a config change.

RAG and prompt engineering solve about 80% of enterprise problems we see, and they're far more debuggable than fine-tuning. We fine-tune (LoRA/QLoRA on Llama 3 or Mistral) when retrieval can't carry the load - typically for output format, domain vocabulary, or behavioural alignment. PharmaBot is RAG-only; AiTutor uses prompt engineering with structured outputs.

Three layers. First, retrieval: every answer is grounded in source chunks, and citations are enforced as part of the output schema. Second, evaluation: per-project golden sets and LLM-as-judge harnesses run in CI before any change ships. Third, monitoring: production traces are sampled and reviewed continuously. We don't claim zero hallucination - we claim measurable, monitored, and bounded hallucination rates.

Self-hosted Llama 3 / Mistral on your infrastructure when data can't leave the perimeter. We've done air-gapped deployments for healthcare and BFSI clients. For lighter constraints, Anthropic and OpenAI both offer enterprise options with no-training guarantees and regional data residency, which we configure as part of the engagement.

Custom golden sets per project - typically 200-500 question-answer pairs covering normal use, edge cases, and adversarial prompts. Ragas for retrieval-quality metrics, LLM-as-judge for answer quality, and human review on a sample of production traffic. Evaluation runs in CI; failing the gate blocks the deploy.

Yes - most production GenAI engagements involve cross-source retrieval. We've integrated with Confluence, SharePoint, Slack, Jira, ServiceNow, and custom intranets. The integration layer is unstructured-first (we parse and index), with hooks for delta sync and access control inheritance.

Got a GenAI problem? Let's talk grounding.

30-minute discussion with our GenAI architects. We'll review your knowledge base shape, hallucination tolerance, and evaluation strategy before recommending an approach.

See related case studiesNo sales pitch. Architects, not BDRs.

LLMs grounded in your data, not the open web.

GenAI systems
shipped

12k+

Documents
indexed (AiTutor)

LLM providers
integrated

Generative AI that doesn't hallucinate.

RAG before fine-tuning

Citations as a first-class output

Every answer points to source chunks. Users can verify, auditors can trace, and regulators can review. PharmaBot wouldn't ship without it; neither would any of our healthcare or compliance work.

Evaluation harnesses, not vibes

Provider-agnostic by design

Anthropic, OpenAI, Llama 3, and others all sit behind a routing layer. Cost, latency, and quality are tracked per provider. Migration between providers takes a config change, not a rewrite.

Where generative AI moves the needle.

A representative selection - not exhaustive. Most engagements involve adapting one of these patterns to a specific knowledge base.

EDTECH

Adaptive learning & AI tutoring

RAG-powered tutoring across 12,000+ course modules with personalised learning paths, automated assessment, and human-in-the-loop content review. AiTutor - HYSEA & Global award-winner.

360DigiTMG1 flagship system

PHARMA

Drug information & clinical assistants

Retrieval-augmented chatbots over drug formularies, dosing guidelines, interaction databases, and clinical literature with strict citation and uncertainty handling.

PharmaBot1 production system

BFSI

Document intelligence & knowledge co-pilots

Layout-aware extraction from contracts, policies, and statements, feeding into knowledge co-pilots for analysts. Strict access controls and audit logs.

BFSI clients2 projects shipped

INTERNAL TOOLING

Enterprise knowledge assistants

Cross-source RAG over Confluence, SharePoint, Slack, and ticketing systems. Used as a first-line internal helpdesk and onboarding companion.

Mid-size enterprises2 deployments

CONTENT

Structured content generation

Marketing collateral, product descriptions, and report generation with brand-voice constraints, factuality grounding, and human review gates.

Cross-industry1 project

CUSTOMER SERVICE

Tier-1 support automation

Conversational agents handling repetitive customer queries with seamless handoff to human agents on confidence drop. Integrated with ticketing systems.

Service desksPilot stage

The moving parts.

Models & Providers

Anthropic ClaudeDefault for long-context reasoning
OpenAI GPT-4o familyMulti-modal & tool calling
Llama 3 & MistralSelf-hosted, fine-tuneable
Cohere & Voyage embeddingsDomain-tuned retrieval
BGE rerankersHybrid retrieval reranking

Retrieval & Storage

pgvector & PostgreSQLDefault vector store, transactional
PineconeManaged vector DB at scale
Elasticsearch & OpenSearchHybrid BM25 + dense retrieval
LlamaIndex & LangChainIndexing & query orchestration
Unstructured.ioDocument parsing & chunking

Production & Ops

LangSmith & PhoenixTracing & observability
RagasRAG evaluation framework
LiteLLMMulti-provider routing
FastAPI & StreamlitApplication layer
Guardrails & NeMo GuardrailsOutput validation