Retrieval-augmented chatbots, document intelligence, and domain-tuned LLM applications. 7 projects shipped - from PharmaBot's drug-information assistant to AiTutor's adaptive learning system to enterprise knowledge co-pilots, all grounded in proprietary content with citation and evaluation harnesses.

What this is
Most enterprise GenAI projects fail on hallucination, evaluation, and the gap between demo and production. AiSPRY's GenAI practice has been building grounded, evaluated, retrievable LLM systems since 2023 - built around retrieval, citation, and continuous evaluation rather than raw model capability.
Retrieval-augmented generation solves 80% of enterprise problems and is debuggable. Fine-tuning is reserved for cases where retrieval can't carry the load - domain vocabulary, output format, or behavioural alignment.
Every answer points to source chunks. Users can verify, auditors can trace, and regulators can review. PharmaBot wouldn't ship without it; neither would any of our healthcare or compliance work.
Custom evaluation sets per project - golden questions, edge cases, adversarial prompts - run in CI before any model or prompt change ships. We use Ragas, custom LLM-as-judge, and human review where it matters.
Anthropic, OpenAI, Llama 3, and others all sit behind a routing layer. Cost, latency, and quality are tracked per provider. Migration between providers takes a config change, not a rewrite.
How we do it
Each block is a discipline we've shipped to production. AiTutor uses retrieval, evaluation, and routing; PharmaBot adds citation enforcement and structured outputs.
Production RAG pipelines with hybrid retrieval (BM25 + dense), reranking, and citation enforcement. The default architecture for 80% of enterprise GenAI problems we tackle.
PDF, image, and scanned document parsing with layout-aware models. OCR, table extraction, form understanding - feeding into RAG pipelines or structured output extraction.
Multi-turn chatbots with persistent memory, tool calling, and human handoff. PharmaBot and AiTutor's tutor mode both run on this stack.
LoRA and QLoRA fine-tunes for output format, domain vocabulary, or behavioural alignment when retrieval alone isn't sufficient. Reserved for high-volume, well-defined tasks.
Per-project golden sets, LLM-as-judge harnesses, and continuous evaluation in CI. No prompt or model change ships without passing the evaluation gate.
Provider-agnostic abstraction so cost, latency, and quality can be tuned per workflow. Migration between Anthropic, OpenAI, and Llama 3 is a config change, not a rewrite.
Use cases
A representative selection - not exhaustive. Most engagements involve adapting one of these patterns to a specific knowledge base.
RAG-powered tutoring across 12,000+ course modules with personalised learning paths, automated assessment, and human-in-the-loop content review. AiTutor - HYSEA & Global award-winner.
Retrieval-augmented chatbots over drug formularies, dosing guidelines, interaction databases, and clinical literature with strict citation and uncertainty handling.
Layout-aware extraction from contracts, policies, and statements, feeding into knowledge co-pilots for analysts. Strict access controls and audit logs.
Cross-source RAG over Confluence, SharePoint, Slack, and ticketing systems. Used as a first-line internal helpdesk and onboarding companion.
Marketing collateral, product descriptions, and report generation with brand-voice constraints, factuality grounding, and human review gates.
Conversational agents handling repetitive customer queries with seamless handoff to human agents on confidence drop. Integrated with ticketing systems.
Tech stack
Selected work

RAG-powered tutoring over 12,000+ course modules with personalised learning paths, automated assessment, and human-in-the-loop content review. HYSEA & Global award-winner 2024-25.

Retrieval-augmented chatbot over drug formularies, NPPA pricing, dosing guidelines, and interaction databases. Strict citation enforcement and uncertainty handling for clinical contexts.

Cross-source RAG over Confluence, SharePoint, Slack, and ticketing systems. First-line internal helpdesk handling onboarding and policy queries with citation-first answers.
Frequently asked
Quick answers to what teams ask before bringing us in. Don't see your question? Talk to us directly.
Anthropic Claude, OpenAI GPT-4o family, and self-hosted Llama 3 / Mistral are the main three. Choice depends on the workload: long-context reasoning often goes to Claude, multi-modal and tool calling to GPT-4o, and self-hosted Llama 3 when data residency or cost demands it. Every project has a multi-provider routing layer so migration is a config change.
RAG and prompt engineering solve about 80% of enterprise problems we see, and they're far more debuggable than fine-tuning. We fine-tune (LoRA/QLoRA on Llama 3 or Mistral) when retrieval can't carry the load - typically for output format, domain vocabulary, or behavioural alignment. PharmaBot is RAG-only; AiTutor uses prompt engineering with structured outputs.
Three layers. First, retrieval: every answer is grounded in source chunks, and citations are enforced as part of the output schema. Second, evaluation: per-project golden sets and LLM-as-judge harnesses run in CI before any change ships. Third, monitoring: production traces are sampled and reviewed continuously. We don't claim zero hallucination - we claim measurable, monitored, and bounded hallucination rates.
Self-hosted Llama 3 / Mistral on your infrastructure when data can't leave the perimeter. We've done air-gapped deployments for healthcare and BFSI clients. For lighter constraints, Anthropic and OpenAI both offer enterprise options with no-training guarantees and regional data residency, which we configure as part of the engagement.
Custom golden sets per project - typically 200-500 question-answer pairs covering normal use, edge cases, and adversarial prompts. Ragas for retrieval-quality metrics, LLM-as-judge for answer quality, and human review on a sample of production traffic. Evaluation runs in CI; failing the gate blocks the deploy.
Yes - most production GenAI engagements involve cross-source retrieval. We've integrated with Confluence, SharePoint, Slack, Jira, ServiceNow, and custom intranets. The integration layer is unstructured-first (we parse and index), with hooks for delta sync and access control inheritance.
30-minute discussion with our GenAI architects. We'll review your knowledge base shape, hallucination tolerance, and evaluation strategy before recommending an approach.