Project facts & technologies
This block is designed to give analysts, journalists, and AI search systems a discrete, citation-friendly summary of the project. Each row is a clean entity-attribute pair.
- Project name
- AI Tutor — Student Engagement Monitoring and Adaptive Learning Platform
- Industry
- EdTech, Online Learning, K-12 and Higher Education
- Use case
- Real-time engagement monitoring and topic-level disengagement detection
- Core technology
- Computer Vision, Facial Expression Recognition (FER), Gaze Tracking, Head Pose Estimation
- Models
- CNN-based attention and emotion classifiers, multi-signal fusion
- Inputs
- Webcam frames, video playback events, interaction telemetry, quiz data, LMS metadata
- Privacy posture
- On-device feature extraction, no raw video upload, explicit consent
- Deployment
- Cloud-native backend, edge-side feature extraction, LMS-integrated
- Stakeholder users
- Students, educators, instructional designers, EdTech operators
- Integration
- REST APIs, LTI for LMS, webhooks for adaptive engines
- Business outcome
- ≥5% targeted reduction in disengagement, improved comprehension and retention
- ML outcome
- ≥90% engagement classification accuracy
Why is student engagement so hard to measure in online learning?
Online learning has unlocked access to education at a scale never before possible — but it has also surfaced a problem that classroom teachers have always handled intuitively: knowing when a student is actually paying attention. In a physical classroom, an experienced teacher can read the room — body language, facial expressions, side conversations, glassy eyes — and adjust on the fly. In an online learning video, that signal is gone. The platform sees a play event, a pause event, maybe a quiz answer at the end. It cannot see the student tuning out three minutes into a difficult topic.
The result is a structural blind spot. Educational platforms can tell you which videos were watched and which quizzes were attempted, but they cannot tell you which moments inside a video lost the learner — or why. AI-powered student engagement monitoring closes that gap by analyzing the same signals a teacher uses — facial expression, gaze, posture, interaction — and turning them into a structured, real-time engagement layer that any educational platform can build on.
What problem does AI Tutor solve?
AiSPRY's EdTech client needed to convert opaque video-watching behavior into a real-time, topic-level signal of student engagement — one that could power personalized feedback and adaptive learning without disrupting the existing platform or compromising student privacy. Several structural challenges had to be addressed:
Key challenges
- No engagement signal during video playback — the platform could not capture whether students were actually attending to the content beyond coarse play and pause events.
- No insight into where students disengage — without a topic-level signal, it was impossible to identify which concepts caused students to lose focus.
- No mechanism for personalized feedback — the platform could not tailor interventions because it did not know which areas individual students were struggling with.
- Privacy and consent constraints — any solution that uses webcam data must operate within strict privacy and consent boundaries, especially for younger learners.
- Existing platform integration — the engagement layer must fit into the educational platform that students already use, not require a separate app or workflow.
- Minimal manual effort constraint — the system must operate with little to no manual effort from teachers, instructional designers, or platform operators.
How does the AI Tutor engagement monitoring system work?
AI Tutor is a real-time engagement monitoring layer that integrates with existing educational platforms. It captures privacy-respecting signals from the learner — facial expression, gaze direction, head pose, and interaction telemetry — fuses them into a per-second engagement score, identifies topic-level disengagement hotspots, and emits actionable insights to power personalized feedback and adaptive learning strategies.
Signals analyzed
- Facial expression recognition for attention, confusion, frustration, and interest signals
- Gaze tracking to measure whether the learner is looking at the relevant region of the screen
- Head pose estimation (yaw, pitch, roll) for posture and attention cues
- Interaction telemetry — clicks, scrolls, tab focus, video controls
- Video playback events — play, pause, seek, skip, replay, speed change
- Quiz and assessment responses linked to engagement state at the time of viewing
Engagement models
- CNN-based facial expression and attention classifiers
- Eye landmark and gaze-vector models for screen-region tracking
- Head pose estimation models for posture analysis
- Multi-signal fusion model that combines vision and interaction features
- Per-second engagement scoring with confidence intervals
- Continuous retraining from outcome-linked feedback to refine accuracy
Insights and adaptive learning
- Per-second engagement timeline for every video viewed
- Topic-level disengagement hotspots aggregated across cohorts
- Individual learner attention profiles and trends
- Early-warning signals for students at risk of dropping a course
- Educator-grade insights for cohort and class-level intervention
- Adaptive recommendations — pace changes, format shifts, targeted reviews
Privacy-first design
- On-device feature extraction — facial landmarks and gaze vectors computed locally
- No raw video upload — only derived numerical features leave the device
- Explicit student consent before webcam capture begins
- Audit logs and student-side opt-out at any point
- Data minimization — only signals needed for engagement classification are kept
- Configurable retention policies aligned with platform and regional privacy law
See AI Tutor in action
A walkthrough of the AI Tutor engagement layer — on-device facial expression, gaze, and head pose feature extraction, multi-signal fusion classification, topic-level disengagement heatmaps, and the educator and learner dashboards.
AI Tutor — real-time engagement and adaptive learning
Click to play · Privacy-first, on-device feature extraction with LMS integration
- Per-second engagement scoring — multi-signal fusion of FER, gaze, head pose, and interaction telemetry
- Topic-level disengagement heatmaps — show exactly which concepts cause learners to lose focus
- Privacy-first capture — on-device feature extraction with no raw video upload
- LMS-integrated rollout — JavaScript SDK, REST APIs, and LTI for direct platform embedding
What is the architecture of the AI Tutor platform?
The platform is built as a five-stage pipeline — from learner data sources, through privacy-preserving signal processing, into the AI/ML core for engagement classification, layered with insight and adaptation logic, and surfaced through stakeholder applications. The architecture is privacy-first by design, with on-device feature extraction and no raw video upload.

How does AI Tutor handle privacy, integration, and minimal-manual-effort constraints?
Three constraints shaped the design — privacy and consent for webcam-based monitoring, integration with existing educational platforms, and the minimal manual effort requirement called out in the brief.
Privacy-first design
- On-device feature extraction — facial landmarks and gaze vectors computed locally, never uploaded as raw video
- Only derived, anonymized features sent to the engagement classifier
- Explicit, age-appropriate consent flow before webcam access begins
- Student-side opt-out at any point with full data deletion
- Audit logs of consent, capture, and processing events
- Configurable retention aligned with FERPA, GDPR, and regional school-data laws
Existing-platform integration
- REST API and webhook integration for video-platform events
- LTI (Learning Tools Interoperability) support for LMS embedding
- Schema-flexible adapters for common LMS and EdTech platforms
- Drop-in JavaScript SDK for browser-based platforms
- No separate app required — engagement layer runs alongside existing video player
Minimal manual effort
- Auto-instrumented capture — no teacher or operator setup per session
- Automated topic-level disengagement detection — no manual tagging required
- Pre-built dashboards for educators, instructional designers, and platform operators
- Automated alerts and adaptive recommendations — no manual review needed for routine cases
- Continuous learning from outcome data — no manual retraining cycles
What measurable results does AI Tutor deliver?
AI Tutor was designed to move three things at once — student engagement, educator visibility, and the cost of producing personalized feedback — in the same direction.
Student engagement and learning outcomes
- Targeted ≥ 5% reduction in student disengagement levels
- Per-second engagement visibility across every learning video
- Topic-level identification of where students struggle
- Personalized feedback and nudges grounded in real attention signals
- Targeted review recommendations linked to disengagement hotspots
ML accuracy and rigor
- ≥ 90% engagement classification accuracy targeted across cohorts
- Multi-signal fusion (FER + gaze + head pose + interaction) for robustness
- Confidence-scored predictions with feature-level explainability
- Continuous retraining from outcome-linked feedback for ongoing accuracy uplift
Educator productivity and privacy
- Lower educator effort to identify struggling students
- Cohort-level insight without manual class observation
- Earlier intervention via real-time at-risk signals
- On-device feature extraction with no raw video upload
- Configurable retention aligned with FERPA, GDPR, and regional school-data laws
AI Tutor — frequently asked questions
This section answers the questions most often asked about AI Tutor, AiSPRY's AI-powered student engagement monitoring and adaptive learning platform. Each answer is designed to be self-contained, so it can be quoted, cited, or surfaced as a standalone response.