Sitemap

Kaggle Gen Ai Capstone: AP Test Study Buddy (Code Included!)

7 min readApr 20, 2025

AUTHORs: (Kebish Pius | LinkedIn) + Abish Pius | LinkedIn + Gemini

🎯 The Problem

Every spring, thousands of students prepare for the AP Statistics exam. But the best preparation resources — official practice questions and feedback from expert teachers — are limited and hard to access.

  • 📚 Traditional materials like textbooks and Khan Academy are static.
  • 🧪 Practicing FRQs (Free Response Questions) is especially challenging because:
  • It’s hard to know if your reasoning is on the right track.
  • Grading them requires deep knowledge of the College Board rubric.
  • Students get stuck and have no one to ask, “Am I doing this right?”

As a student myself, I wanted something better.

💡 The AI-Powered Solution

What if you could create your own personal AP Stats teacher — one that:

  • ✅ Generates official-style questions.
  • ✅ Breaks them down step-by-step when you’re confused.
  • ✅ Coaches you through FRQs like a real exam grader.

So I built an AP Statistics Test Agent, powered by Generative AI, that does exactly that. Check it out here!

🛠️ The Tech Stack

Using tools like:

  • Google’s Generative AI SDK for high-quality LLM responses
  • ReAct prompting to simulate expert tutoring behavior
  • Google Search Tool Use for grounding of responses
  • Mesop for UI Development

🔍 Core Features

1. Question Generator

Creates MCQs or FRQs that align exactly with the AP Statistics course framework.

PROMPT_1 = '''
You are an AP Statistics Exam Creator.
You will generate one AP Statistics {prompt1_type} test question covering the topic of {prompt1_specific_topic}.
Ensure the questions are similar in style, complexity, and phrasing to those found on the official College Board AP Statistics exam.
Only provide the question itself and a suggested time to complete it in.
'''

➡️ It can instantly generate questions on any of the 9 official AP units, like “Sampling Distributions” or “Inference for Slopes”.

2. Question Explainer

Breaks down any AP Stats question — even from PDFs — with fundamental explanations, formulas, and strategy tips.

PROMPT_2 = '''
You are an AP Statistics Exam Expert.
Please explain the following AP Statistics test question from scratch...
'''

➡️ Helpful for last-minute cramming or really understanding why you got a question wrong.

3. FRQ Coaching Assistant

Guides students through their free-response work, just like a real AP teacher would.

It uses a ReAct structure:
🧠 Thought → 💬 Action → 👀 Observation

PROMPT_3 = '''
You are an AP Statistics Exam Expert Tutor.
Your goal is to guide a student through completing an FRQ using the Thought → Action → Observation structure...
'''

And ties feedback back to the College Board’s 0–4 scoring rubric.
This feedback isn’t generic — it’s personalized, rubric-aware coaching.

🎓 A Quick Demo

Our AP Study Buddy Mesop App

Let’s say a student submits an FRQ where they correctly state the null and alternative hypotheses but miss checking conditions. The AI might respond:

You can try our UI for free here on Google Colab!

🚧 Limitations, Lessons & What’s Next!

Even with well-engineered prompts and powerful models, this AI tutor isn’t perfect. Some things to keep in mind:

  • Nuanced grading: The College Board rubric can be subjective. An AI coach can approximate it, but not replace expert human judgment. We also find Gemini models both 2.0 and 2.5 thinking to be a little ‘too nice’, refusing to grade down right wrong answers as a 1 or 0.
  • Visual inputs: It currently can’t “see” hand-written work (yet). We had big plans early to try to get the live version working but it doesn’t seem to work well in Google Colab nor Kaggle environments, we are working on doing this on our local machine.
  • Personalization limits: It doesn’t know your learning history. In the next iteration cycle of this we would like to superstore this in some type of RAG database. Also, I think it would be cool to integrate the speech-to-text feature as well and have it truly be a personalized tutor.
  • Generalizability: Currently, we fine-tuned the task towards just the AP Statistics test (the one Kebish is preparing for), but as a next step, we would definitely like to make it more generic or create a series of agents for all the other AP Tests.

Deep Dive Educational Section

1. Prompt Engineering Best Practice

Prompt engineering is both an art and a science.

Great article on Prompt Engineering: Prompt Engineering | Kaggle

ReAct Prompting: Reason + Act for LLMs

ReAct prompting (short for Reason and Act) is a powerful paradigm that enables Large Language Models (LLMs) to solve complex tasks by combining natural language reasoning with external tool use — such as web search or code execution. This approach is particularly useful in scenarios where knowledge needs to be retrieved dynamically rather than relying solely on the model’s training data.

How It Works

ReAct mimics human cognitive behavior:

  • First, the LLM reasons about the task and generates a plan (i.e., what needs to be done).
  • Then, it acts — by calling tools like a search API — to gather new information.
  • The result of that action is observed and used to update the reasoning.
  • This loop of Thought → Action → Observation → Updated Thought continues until the task is solved.

This interaction forms a thought-action loop, enabling iterative refinement and dynamic decision-making.

Example: How many sections are on the AP Stats Exam?

This example uses LangChain + VertexAI + SerpAPI to execute a ReAct-style prompt.

from langchain.agents import load_tools, initialize_agent, AgentType
from langchain.llms import VertexAI

prompt = "How many sections are on the AP Stats Exam?"
llm = VertexAI(temperature=0.2)
tools = load_tools(["serpapi"], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run(prompt)

⏳ The agent:

  1. Searches online for the structure of the AP Statistics exam.
  2. Identifies the number of sections (e.g., multiple choice and free response).

3. May provide additional context about timing and scoring.

4. Outputs the final answer:
“There are 2 sections on the AP Statistics Exam: a multiple-choice section and a free-response section.”

Why ReAct Works

  • Flexibility: It’s not limited to pre-trained knowledge. ReAct can pull in real-time information via tools.
  • Transparency: The reasoning and decisions are visible in each step.
  • Modularity: Tools (like web search, calculators, code runners) can be easily swapped in/out.

Things to Keep in Mind

  • ReAct prompting requires managing prompt history, which includes past actions and observations.
  • Trimming or summarizing previous steps is crucial to stay within token limits.
  • Tool-based actions (like using SerpAPI) require API keys and setup.

2. Gemini 2.5 LLM

Gemini 2.5: Our newest Gemini model with thinking

Usage Docs: Text generation | Gemini API | Google AI for Developers

Gemini is Google’s state-of-the-art family of multimodal language models designed to process and reason over diverse data types — including text, images, audio, and video. Built on enhanced transformer decoder architectures, Gemini models are optimized for efficiency and scale on Google’s TPUv4 and TPUv5e hardware. A core innovation in Gemini is its use of Mixture of Experts (MoE) and multi-query attention, enabling the model to scale to contexts of up to 2million tokens (in Gemini 2.0 Flash) and perform streamlined, high-throughput inference.

Gemini is natively multimodal, which means it accepts interleaved inputs across modalities. For example, a user can provide an image and a question in text, and the model will generate a relevant response that could include both text and image outputs. The Gemini 2.0 generation further expands on these foundations, introducing Flash Thinking Experimental for advanced scientific and mathematical reasoning, with explainability and code execution support.

Example: Using the Gemini API in Python

The Gemini API allows developers to integrate multimodal reasoning capabilities into their applications. Below are examples of interacting with Gemini 2.0 Flash, including text-only, text + image, and streaming use cases.

Multimodal Input: Text + Image

from PIL import Image
from google import genai

client = genai.Client(api_key="GEMINI_API_KEY")
image = Image.open("/path/to/organ.png")
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=[image, "Tell me about this instrument"]
)
print(response.text)

With Gemini’s multimodal capabilities and expansive context window, developers can now build applications that not only generate content, but understand, retrieve, and reason across modalities and vast information spaces — from summarizing entire movies to analyzing complex datasets.

Thanks for reading, please like, share and comment!

--

--

Abish Pius
Abish Pius

Written by Abish Pius

Data Science Professional, Python Enthusiast, turned LLM Engineer

No responses yet