System Design Behind Ditanyain: Scaling AI-Powered Formative Assessments

Overview

Ditanyain is an AI-driven platform built to solve a major pain point in digital learning: passive consumption. While online courses offer flexibility, many students merely skim through modules to claim certificates without truly grasping the material. Ditanyain changes this by automatically generating quizzes directly from course content, encouraging active recall and providing the feedback learners need to master a subject.

The MVP Phase

The initial Proof of Concept (PoC) was built to validate whether AI-generated assessments could be reliably integrated into an LMS workflow. At this stage, the architecture followed a simple synchronous request–response model, where the backend acted as a proxy between the client and the LLM service.

MVP Sequence Diagram

The following diagram shows the original interaction flow used during the MVP stage:

The MVP Bottlenecks

As testing progressed, the synchronous approach revealed several critical engineering challenges:

High Latency: The backend was blocked while waiting for the LLM to process and generate content, leading to a frustrating experience for the user.
Operational Inefficiency: Identical modules triggered redundant AI generations, wasting tokens and increasing costs unnecessarily.
Poor UX: The reliance on a long-running HTTP request meant users were often stuck on a loading screen for 10–20 seconds.
Security Concerns: Lacking a persistent storage layer, quiz validation was often pushed to the client-side, making the system vulnerable to manipulation.

Design Direction

To address these constraints, the system was restructured around asynchronous processing and persistence. The following design principles guided the changes:

Asynchronous Task Processing

Time-intensive AI generation tasks were moved off the request path using a Message Queue. The API now responds immediately, while generation is handled in the background.

Database-Level Locking

Because the system runs across multiple instances, in-memory locking was insufficient. Database-level locks were introduced to ensure that only one worker processes a given tutorial_id at a time, preventing duplicate work.

Batched Generation with Context

Instead of generating all questions in a single prompt, content is produced in smaller batches. Each batch includes context from previous results to reduce duplication and maintain consistency.

Revised Architecture

The updated design introduces a persistent database layer that functions as both storage and a basic cache. Existing assessments are served directly, while missing ones trigger background generation jobs.

Queue Consumer Workflow

The queue consumer is responsible for coordinating locking, batching, and persistence outside of the main request lifecycle:

Ditanyain Queue Consumer Sequence Diagram

Infrastructure

Conclusion

Transitioning to an asynchronous, queue-based architecture improved system responsiveness and reduced unnecessary AI workloads. More importantly, it established a foundation that can scale horizontally without significantly increasing operational complexity.

You can find the full implementation and infrastructure details on our GitHub.