6
your-first-ai-agent
6
5
4
2
3
1
5
15
11
9
20
18
19
17
16
15
14
13
12
11
10
8
7
6
5
4
3
2
1
10
18
17
16
14
13
12
10
9
8
7
6
4
3
2
1
9
8
7
6
5
4
3
2
1
7
6
5
4
3
2
1
6
5
4
3
2
1
Next lesson
Next lesson
In this lesson

Choosing which Large Language Model will power your agent is one of the most important technical decisions in your project.

It affects how well your agent performs, how much it costs to operate, and how predictable its behaviour is over time.

There is no single best model. The right choice depends on your goals, your budget, and how much control you need over the output.

Teams that rush this decision often regret it later. The key is to test early, define clear priorities, and avoid locking yourself into a single vendor or setup.

A good LLM strategy answers four main questions:

  1. Which model are you using and why?
  2. How often will you test alternatives?
  3. What matters more to your use case: speed or power?
  4. What is your fallback plan if  the model fails or degrades?

Let’s go through each of these.

Choosing a model is about fit, not prestige. Some models are fast and inexpensive, others are slower but better at complex reasoning.

If your use case involves short customer interactions, latency and cost may matter more than depth.

If your use case involves multi-step reasoning or detailed summaries, power may come first.

Testing early and often helps you see how models behave with your specific data. Every LLM has quirks. Some are better at instruction following, others at tone consistency or precision. You can only discover this through real examples from your own workflows.

Fallback planning is equally important. Even the most stable APIs occasionally change behavior, degrade, or go down. Always define a backup model and a policy for switching when performance drops below your baseline. (Or make sure that your agent builder provides a default fallback option, like Botpress does)

At Terminal Roast, Ross, the accountant, runs the numbers. The team wants their agent to handle simple customer chats about coffee and pastries without noticeable delay. After testing a few options, they decide to use Gemini 2.5 Flash. It’s fast, inexpensive, and provides enough reasoning power for casual customer conversations.

For fallback, they configure the system to switch to a secondary model if the latency or error rate exceeds their threshold. This choice keeps the user experience smooth and the operating cost predictable.

Ross notes that if they expand the agent later into more complex tasks, they can revisit the model choice.

Every model decision is also a business decision. The wrong choice can double your operating costs or create unnecessary delays in user interactions. The right one balances performance and cost in a way that matches the experience you want to deliver.

Equally important is flexibility. Avoid designing your stack so tightly around one model that switching later becomes painful. Use an abstraction layer or a vendor that supports multiple models so you can adapt as the landscape changes.

This flexibility keeps your system resilient and ensures you are not dependent on a single provider’s roadmap or pricing model.

To create a real LLM strategy, document three things:

  • Your primary model and why it was chosen.
  • Your performance and cost thresholds for when to consider switching.
  • Your fallback model and the rules for activating it.

Revisit these decisions at least quarterly. The pace of change in the LLM ecosystem is incredibly high, and new models often outperform older ones at lower cost. Treat this as an ongoing evaluation, not a one-time setup.

Terminal Roast’s decision to prioritize speed and predictability over raw power is what makes their first deployment sustainable. It keeps customers happy, limits cost, and allows them to collect real-world data without technical instability.

That balance — choosing an appropriate model, planning for change, and keeping flexibility — is what separates experimental projects from production ones.

Your LLM strategy should always support your business goals, not dictate them.

Action: Write down which model you plan to use, what matters most for your use case (speed, cost, or depth), and what your fallback option will be. Review these choices regularly as you collect usage data.

Summary
How to choose the right large language model for an AI agent based on performance, cost, latency, and long-term reliability.
all lessons in this course
Fresh green broccoli floret with thick stalks.