Skip to main content

AI agents


What is an agent? #

Put simply, an agent is a system that can observe and act on its environment.

Suppose there is an agent that automatically responds to emails. The environment of this agent is the email system and the inbox. The actions the agent can take might include checking for new emails, opening and reading emails, coming up with the response content, and sending the actual reply.

Why should we care about agents now? #

Agents aren’t a new concept–at least in one shape or form, they’ve been thought about in past research. However, while agents were mostly theoretical in the past, as LLMs improve and AI becomes more capable, it’s becoming easier to build real agents.

Agents can use LLMs to help automate things like booking a trip or ordering groceries. They can solve human problems like humans do, but without needing a human in the process.

This form of automation will help with many business problems like aiding in customer support, doing market research, or coming up with software development plans.

How does an agent work? #

An agent will take a task request and then take action to achieve a goal. An agent might even consist of sub-agents for specific tasks.

To actually be useful, an agent needs to be able to:

  • gather information from the environment
  • decide what are the best actions to take
  • actually perform those actions
  • understand what effect those actions had on the environment

This is a simplified representation of the steps an agent takes.

%%{init: {'theme': 'default' }}%% graph TD subgraph Agent [ "" ] N2[Plan] N3[Evaluate] N4[Execute] end N1[Task request] N5[Result] style Agent stroke-dasharray: 5 5 N1 --> N2 --> N3 --> N4 N4 --> N3 N3 --> N5

Tools bridge the agent and its environment #

An agent isn’t very useful without a way to understand or interact with its environment.

That’s where external tools come in. These generally have one of a few purposes: provide the agent more information, give the agent additional capabilities, or let the agent take actions in the environment.

For example, a grocery ordering agent might need a shopping list or tracker, the ability to search for items, access to payment information, and API access or a browser to actually place an order.

External tools can provide all of those pieces and the agent can pick which ones it needs.

How do LLMs help agents? #

Although LLMs cannot directly observe or interact with the environment, LLMs can help glue the pieces the agent needs to succeed.

Planning #

Once an agent receives a task request, it needs to come up with a plan to meet that objective.

First, the task needs to be translated into a goal and constraints. Then, the agent must devise a series of steps that will achieve that goal.

This is where LLMs can help. The LLM can be fed the problem, informed about the available tools, and asked to generate a sequence of steps.

Let’s take an example of a simple agent that will buy a movie ticket. Suppose we are prompting an LLM to come up with the plan.

We will tell it something like this:

You are an agent that will buy a movie ticket. You have access to the following functions:
- get_current_date()
- find_showtimes_for_movie(movie_name, location, date)
- obtain_payment_info(bank_info)
- book_ticket(movie_name, location, showtime, seat_number)
- email_ticket(ticket_info, email_address)

Please generate a plan to buy a ticket for a specified movie.

NOTE: Function calling, like in the example above, is one way to give LLMs the ability to understand external tools. It provides a way for the LLM to provide parameter values to functions that can be executed in software.

The LLM will then generate a plan, which might look something like this:

1. Accept user input for the movie info, desired date, payment method, and contact info
2. Find showtimes for the movie using `find_showtimes_for_movie` and `get_current_date`
3. Book a ticket using `book_ticket` and `obtain_payment_info`
4. Send the user an email with the ticket information using `email_ticket`

Evaluation #

An agent will need to verify the generated plan and also check what was actually executed.

The LLM can be helpful here as well. Let’s revisit the movie ticket example.

When the plan is generated, the LLM can verify that the steps make sense. We can ask:

Here is a plan to buy a movie ticket making use of external functions.

Please verify that this plan makes sense and is valid. If not, state what steps are invalid.

The plan is:

1. Accept user input for the movie info, desired date, payment method, and contact info
2. Find showtimes for the movie using `find_showtimes_for_movie` and `get_current_date`
3. Book a ticket using `book_ticket` and `get_current_date`
4. Send the user an email with the ticket information using `email_ticket`

The LLM will then respond with something like this:

The plan is invalid. Step 3 needs the payment information to actually book the ticket.

Now let’s suppose that the plan is valid and the agent is proceeding to execute it.

After each execution step, the LLM can be asked to validate the resulting state.

The step "Find showtimes for the movie using `find_showtimes_for_movie` and `get_current_date`" was executed.
Please verify that this step executed correctly.

Here is the outcome:
'''
The result of `get_current_date` was: "1999-04-15".

Then `find_showtimes_for_movie` was called with arguments: ("The Matrix", "New York", "1999-04-12")`.
The result was: No showtimes were found.
'''

The LLM might respond like so:

The step "Find showtimes for the movie" was not executed correctly.
There are no available showtimes for the movie on a date in the past.

Challenges with building agents #

Agents are prone to several hurdles, and these must be accounted for in the development process.

Agent quality #

The bar for quality is simply higher for agentic over non-agentic systems.

Since an agent is used for larger, more complex problems, the cost of failure is higher.

Also, since an agent could execute many steps, any decrease in accuracy has the potential to compound.

Ways the agent might fail #

  • Planning: the plan suggests incorrect tool use, wrong arguments, or simply has the wrong or missing steps
  • Tool use: if the tool errors out (like an API request fails), the tool produces the wrong result, or the tool was called incorrectly
  • Approach to the task: even if the agent completes the task, it might take a suboptimal approach

Further reading #

There’s already a wealth of content on agents. Here is a small selection, but there’s a lot more material out there.

The Rise and Potential of Large Language Model Based Agents: A Survey

The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey

Chip Huyen’s post on agents