Building an AI Agent Tutorial – Part 1

The use of the term “AI Agent” has increased by 10x in the last 1 year, as per data from Google Trends. This surge reflects a broader shift: people and organizations increasingly want AI Systems that not only answer questions, but also take actions on their behalf. From simplifying mundane tasks to streamlining business operations, the promise of Agentic AI is capturing global attention.

Trend for AI Agent over time (Image: Google Trends)

So, what does this really mean in practice? Let’s start with a relatable scenario of how AI Agents can transform everyday tasks in the near future. Imagine planning for a vacation, which involves booking hotels, flights, and rental cars. Today, this process is fragmented and time-consuming. In an Agentic AI world, however, we could simply provide a simple prompt that can generate tailored travel packages, complete with itineraries, restaurants, and bookings.

Here is an example prompt that would work in such a scenario:

“I would like to book a family trip with 2 kids in the months of June/July for a weekend plus 2 days. Do not include the 2nd week and 3rd week of June. I would just need to carry two cabin bags, and prefer tasting the best local food. Plan for an itinerary not longer than 2-3 hours drive from the city.”

In this article, we will go beyond the buzzword that is AI Agents. You will first understand the fundamentals of AI Agents and then explore the platforms that make them possible. Finally, we will build a hands-on project: a YouTube Summarizer Agent using the Phidata framework. By the end, you will know what Agentic AI is and how to start building one with the SOTA tools.

Note: This is the first article in a two-part series on building AI Agents from the ground up. In this article, we will explore the value of AI Agents, introduce popular Agentic AI platforms, and walk through a hands-on tutorial for building a simple AI Agent. The next part of the series will dive deeper with a hands-on tutorial. There, we will build Agents that can automate tasks and interact with external tools and APIs.

Fundamentals of AI Agents

In simple terms, AI Agents are systems that can perform tasks autonomously by interpreting the data from the environment. AI agents can make decisions based on that data to achieve the goals. Think of them as orchestrators, connecting various tools, using Large Language Models (LLM) to reason, plan, and execute tasks. For a detailed introduction to LLMs, you can refer to this article.

Let’s break down this definition using the above vacation planning example:

Perform tasks autonomously: Book flight, hotel, and rental car reservations through the respective vendors.
Interpreting the data: Account for factors like weather, traffic, and local events to suggest the best activities that fit the pace.
Making decisions: Consider there are dozens of restaurants available, Agents can provide recommendations based on the indicated preference and past reviews.
Achieve goals: Put together a travel plan that matches the requirements – dates, duration, preferences, and family needs.

Agentic AI Platforms

An Agentic AI framework is a toolkit that enables the creation of AI systems capable of reasoning, planning, and taking actions autonomously or semi-autonomously through tool use and memory. In short, these frameworks provide the structure needed to create agents.

There are several popular Agentic AI platforms, such as LangChain, CrewAI, and Phidata. For this tutorial, we will use Phidata – a lightweight and developer-friendly platform. Phidata comes with built-in access to a variety of tools and LLMs. This allows us to build and deploy AI Agents within just a few lines of code.

Popular built-in Tools and Model wrappers in Phidata (For a full list, links here – Models, Tools.)

Build a YouTube summarizer Agent

The YouTube Summarizer Agent is designed to extract key insights and main points from any YouTube video. It saves time by providing concise summaries without needing to watch the entire content. For the purpose of the tutorial, we will use Google Colab notebook to write and execute the code and Phidata Agentic AI Platform to power the Agent.

Model: Within Phidata, we will leverage the Groq model hosting platform. It is an inference service that runs LLMs on a dedicated GPU infrastructure. Note that it is different from Grok, which is an LLM from xAI. Since LLMs are resource-intensive, using Groq helps to offload computation from the local hardware or Colab-provided hardware. This ensures faster and more efficient execution. Groq has access to multiple models from different LLM providers. (see full list here)

Tools: To retrieve YouTube video data, we will use the built-in Tool from the Phidata framework (called YouTube Tools). This tool helps us access video metadata and captions. The agent then passes these to the chosen LLM to generate accurate and insightful summaries.

Here is the code for a YouTube summarizer agent:

from phi.agent import Agent
from phi.model.groq import Groq
from phi.model.openai import OpenAIChat
from phi.tools.youtube_tools import YouTubeTools


agent = Agent(
    # model=Groq(id="llama3-8b-8192"),
    model=Groq(id="llama-3.3-70b-versatile"),  ## Toggle with different LLM model
    tools=[YouTubeTools()],
    show_tool_calls=True,
    # debug_mode=True,
    description="You are a YouTube agent. Obtain the captions of a YouTube video and answer questions.",
)


agent.print_response("Summarize this video https://www.youtube.com/watch?v=vStJoetOxJg", markdown=True, stream=True)

Following is the output generated by the YouTube Summarizer agent (above code). The YouTube link in the above code is a video of Andrew Ng on the Machine Learning specialization. As shown below, it accurately summarizes the video content. Note that the response may vary for each run because of the probabilistic nature of LLMs.

Detailed Tutorial

Here are the step-by-step instructions for creating the YouTube Summarizer agent.

1. Clone Notebook

Clone Colab notebook here (it requires a Google account)
Install dependencies (first cell with code)

2. Get API key for Groq

In order to run the Agent, given that we use the Groq model hosting platform, we need an account with Groq. Follow the steps below to sign up / log in to Groq and get an API key.

– Visit the Groq Developer Portal: Open your browser and go to: https://console.groq.com

– Sign Up or Log In

If you already have an account, click Log In.
If you’re new, click Sign Up and follow the prompts to create an account (you may need to verify your email).

– Access the API Section

Once logged in, you’ll land on the Groq Console.
Navigate to the API Keys section from the sidebar or dashboard.

– Generate a New API Key

Click the “Create API Key” button.
Give your key a name (e.g., “workshop-key”).
Click Create or Generate.

– Copy and Store the Key Securely

Your API key will be shown only once — copy it immediately and store it in a secure location.
Never expose your API key in client-side code or public repositories.

3. Add the API key in the Secret Manager

Click on Secrets (Key sign) on the left pane of Colab
Provide the name as GROQ_API_KEY and the Value as the API Key copied in Step 5 above
Toggle “ON” the notebook access.

Conclusion

In this article, we explored the rising demand for an AI Agent and walked through a real-world example of how they can simplify everyday tasks. We broke down the fundamentals of AI Agents and some popular Agentic AI Frameworks. We also built a hands-on project: a YouTube Summarizer Agent powered by Phidata.

This is just the beginning. In the second article of this series, we will go deeper by building a study planner agent that does not just generate plans but also takes actions. It will create tasks in Jira, send calendar invites, and demonstrate how AI Agents can seamlessly integrate with external tools and APIs to automate real-world workflows.

Check out the part 2 of this series here – Building Study Planner Agent: AI Agent Tutorial Part 2

Co-Author for the article: Abhishek Agrawal

Praveen is a seasoned Data Scientist, with over a decade of experience in analytics. He has tackled complex business challenges and driven innovation through data-driven decision making. His expertise spans across areas such as Machine Learning, Statistics, and Scalable Analytics, helping to launch multiple revolutionary products.