Developing an Interactive RAG-Powered Learning Tool

In this project, I set out to create an engaging and personalized learning experience using Retrieval-Augmented Generation (RAG) technology and AI-driven tools. Here’s a detailed look at the journey and outcome.

The Problem Statement

The aim was to address two key challenges:

Implementing a RAG system and developing tools to leverage its capabilities.
Adding text-to-speech functionality to make learning more accessible and interactive.

To determine the tools to develop, I started with some fundamental questions:

How can we make studying more interactive and engaging?
Can we provide instant, accurate answers to students' questions?
Is it possible to generate custom study materials tailored to individual needs?

With these objectives in mind, I ventured into the world of RAG and AI agents.

Developing the RAG System

The core of the project is the Retrieval-Augmented Generation (RAG) system, which I built through the following steps:

Document Ingestion:
Created an ingest.py script to process the NCERT Sound chapter PDF. Used PyPDFLoader and PDFPlumberLoader for robust PDF parsing. Split the text into manageable chunks using RecursiveCharacterTextSplitter.
Vector Database:
Implemented vector_db.py to create a Chroma vector store for efficient similarity searches. Used the sentence-transformers/all-MiniLM-L6-v2 model for generating embeddings, balancing performance and accuracy.
RAG System Implementation:
Developed the RAG system in rag_system.py using Google’s Gemini 1.5 Flash model for generating responses. Integrated context retrieval from the vector store to inform responses.
API Development:
Built a FastAPI backend (app.py) to expose endpoints for generating responses, creating quizzes, and more.
Frontend Design:
Designed a Streamlit frontend (frontend.py) to provide an intuitive interface for students to interact with the system.

Specialized Agents and Tools

To enhance the learning experience, I developed several specialized agents and tools:

Quiz Agent (quiz_agent.py):
Generates dynamic quizzes based on chapter content. Supports multiple question types (e.g., multiple-choice, true/false). Provides instant feedback on user responses.
Diagram Agent (diagram_agent.py):
Creates ASCII flowcharts to visualize complex concepts. Summarizes key chapter topics in an engaging format.
Exam Guide Agent (exam_guide_agent.py):
Generates custom exam guides with important questions and detailed solutions. Tailored for effective test preparation.
Summary Tool (summary_tool.py):
Produces concise summaries for quick revision. Lists important topics in a structured format.

Text-to-Speech Integration

To make learning accessible and interactive, I integrated the Sarvam.ai API for text-to-speech functionality. This feature allows students to:

Listen to answers, summaries, and other content.
Cater to auditory learners and enhance multi-modal learning experiences.

Integration Process:

Set up authentication with the Sarvam.ai API key.
Added a dedicated FastAPI endpoint for text-to-speech requests.
Implemented frontend controls for text-to-speech conversion and playback.

Key Features

Question & Answer System: Ask detailed questions about the Sound chapter and get accurate answers.
Text-to-Speech: Convert text responses to speech for auditory learning.
Chapter Summary: Generate concise summaries of chapter content.
Interactive Quiz: Take quizzes with dynamically generated questions.
Summary Flowchart: Visualize key concepts as flowcharts.
Exam Guide: Create tailored guides with practice questions.

Technology Stack

Backend: FastAPI
Frontend: Streamlit
AI Model: Google’s Gemini 1.5 Flash
Vector Database: Chroma
Embeddings: Hugging Face (sentence-transformers/all-MiniLM-L6-v2)
PDF Processing: PyPDFLoader, PDFPlumberLoader
Text-to-Speech: Sarvam AI API

Setup and Installation

Clone the repository:

git clone <repository_url>
cd <repository_directory>

Install dependencies:

pip install fastapi streamlit langchain google-generativeai requests chromadb sentence_transformers langchain_community pydantic uvicorn

Set up environment variables:

GOOGLE_API_KEY: Your Google API key for Gemini.
SARVAM_API_KEY: Your Sarvam AI API key for text-to-speech.

Run the application:

Start the FastAPI backend:

uvicorn api:app --reload

Run the Streamlit frontend:

streamlit run frontend.py

Results and Applications

This interactive learning tool demonstrates the potential of AI in education by offering:

Engaging Study Sessions: Interactive Q&A, quizzes, and flowcharts.
Accessibility: Multi-modal learning with text-to-speech capabilities.
Customization: Personalized study materials and exam guides.

Potential applications include:

Student Learning: Enhancing comprehension of complex topics.
Test Preparation: Efficiently revising and practicing key concepts.
Educational Platforms: Integrating advanced tools for interactive learning.

Conclusion

Developing this RAG-powered interactive learning tool has been an incredible journey. By combining RAG technology, specialized AI agents, and text-to-speech capabilities, I’ve created a platform that redefines how students engage with study materials.

This open-source project is available on GitHub. Feel free to explore, contribute, or adapt it to your needs. Let’s make learning more accessible, engaging, and effective for everyone!

Happy learning!

Interactive RAG-based Learning Platform