AI Agent: The Shape-Shifting Chimera of The Tech World
Oct 8, 2025
When the term AI agent is used, it’s tempting to start imagining a human-like robot immaculately dressed in a suit, asking for their martini to be shaken and not stirred. While there are tech companies working to develop humanoid robots with more autonomy, this is not for the most part what tech CEOs, industry experts, and academics mean when they describe something as an AI agent.
Agents are the tech world’s shape-shifting chimeras (hybrid creatures from mythologies or fiction with transformational abilities). Like chimeras, the definition of an agent varies by context – academia, engineering, industry – and emphasis on certain agentic characteristics can differ. For example, the ability to decide and act on using ‘tools’ is emphasised as a key feature of AI agents in industry and engineering terms, but not always in academia.
Even within a context like the tech industry, no cohesive definition of an AI agent exists. An AI agent from one tech company doesn’t have the exact same capabilities as another. The shifting goal post marking AI’s capabilities is leading to a spectrum of simple to complex AI agents. This in turn makes it difficult to determine at what point an AI system becomes an agent, and when an AI agent could be considered something even more advanced, like Artificial General Intelligence (AGI).
While it appears to be a bit of a minefield, it’s worth trying to provide a working definition of an AI agent. One that avoids the pitfalls in anthropomorphising AI, undermining human involvement and passively supporting claims of sentient AI. Demystifying industry jargon will enable a better grasp on what AI can and can’t do today, and where it’s heading in future.
In the simplest terms, an AI agent is a software system programmed with the ability to perceive its environment, take decisions and act accordingly without being given step-by-step instructions. They are proactive rather than reactive, and independently work out how to achieve their programmed goals.
Myth-Busting: What is not an AI Agent
Any software system that is following predefined steps, (i.e., instructions set out by a human) to achieve a goal, is not a true agent. An AI agent would be presented with the goal, and left to determine which steps it needs to take in order to achieve that goal.
Many AI chatbots are incorrectly considered agents. They are increasingly found on websites to help users, and work via rule-based responses to send to users based on identifying keywords. The chatbots are following a set of instructions: If the user asks X, then the response should be Y.
AI writing assistant Grammarly is a tool, rather than an agent. It generates helpful writing tips, drawing on linguistic patterns identified by a machine learning system that is trained on a large collection of text. Grammarly suggests improvements to text but doesn’t take independent action; the user still makes the decision to accept, modify or reject any suggestion.
DALL·E is another AI tool, rather than an agent, that generates images from text prompts. It works by using a neural network trained on massive datasets of images with accompanying text that describes them. This enables it to model the statistical relationships between language and visual concepts to create original images.
Large language models (LLMs), such as those used in older versions of ChatGPT (4 and earlier) or Gemini (prior to 2.0), are based on word prediction i.e., identifying which word is most likely to come next in a given text. When a user inserts a prompt into the interface, an LLM is following a preset workflow rather than acting as the ‘decision-maker’. The user decides when to prompt the LLM into responding; the LLM will only respond when instructed to do so.
However, with AI companies churning out models at such great speed, the most recent releases such as ChatGPT-5 (which is the default model on the free version) could be classed as agents, or at the very least having increased agency. |
Criticisms have been raised at companies and start-ups for ‘agent washing’, where most products would be better classified as an AI assistant or tool. The labelling of something as agentic gives it the veneer of something that is more capable. As such, it’s become a trendy, marketing buzzword sometimes misleadingly used to drive sales. The use of language that suggests agentic workforces can replace humans has also been criticised by some tech experts, for unrealistically suggesting AI could replace human workforces at a mass scale. While the most advanced AI agents are able to work across multiple domains, many remain in a testing phase.
The AI Agent Spectrum
Smart thermostats like Nest or Ecobee are AI agents, albeit simple ones. With set goals such as comfort and energy saving, they perceive temperature and room occupancy and adjust heating/cooling accordingly.
Previous boundary-pushing AI of the past decade includes agents such as AlphaGo and OpenAI Five, which have mastered winning the games of Go and Dota 2 respectively. In the financial world, trading firms like Two Sigma also use agents. Programmed goals of maximising profits given risk constraints direct the agents to process copious market data, make the decision to buy, sell or hold a particular asset, and act by placing buy/sell orders.
In the last five years, AI has become more accessible to the public, in several ways that incorporate agents. Software company Waymo has developed cars where quite literally no one is in the front seat, operating as a ride-hailing service in some US cities.
Reinforcement Learning What makes systems like Nest, AlphaGo, Two Sigma’s trading bots and Waymo’s cars different from tools such as Grammarly or DALL·E is that they use Reinforcement Learning. Instead of being trained on a given dataset, an agent ‘learns’ by interacting with an environment, taking actions and receiving feedback in the form of rewards or penalties (i.e., ‘points’ on how well it's doing). Over time, this trial-and-error process helps the agent discover which actions lead to the best outcomes. For example, Nest would have a points system that rewards the AI for maintaining a user’s desired room temperature, while Waymo might have rewards for arriving at a destination, staying within the white lines, and driving at the speed limit; and punishments for crashing, swerving, driving backwards etc. |
There are also an increasing variety of AI agents that make heavy use of LLMs. These include Deep Research via LLM’s such as ChatGPT and Gemini; Anthropic’s Sonnet 4.5 and Claude Code; and agents created using LangChain or those available through ChatGPT’s Plus version, with GPT-4-turbo (formerly available through OpenAI’s ‘Operator’).
All these examples embed an LLM into an agentic framework. One that allows for more autonomy in determining the steps to achieve a set goal, taking those steps, and evaluating the outcome of those steps. Access to an increased variety of ‘tools’ (e.g., web-browsing, image-generation, email and calendar access) and a longer-term ‘memory’ are also key features of these more capable agents. Critically, this enables them to achieve loose or vague goals, such as “write me a compelling email” or “create a fun app”, and know when to decide that a goal has been achieved. This is in contrast to the clearly defined goals of a Nest thermostat, whose goal is "keep the room at this temperature" or AlphaGo "win the game of Go" – both of which have a clear achieved/failed dichotomy.
These goals are set by the user, who in most cases interacts with the agent via a chat interface. For instance, Deep Research is able to generate detailed reports (to name a few topics: predicting market trends, conducting scientific literature reviews, and technology forecasting), from a single user’s prompt, having spent up to thirty-minutes independently working on it.
Claude Code can be prompted to write, debug, and explain code in various programming languages, making it an asset for developers to quickly build software, fix errors, and understand complex codebases more efficiently.
LangChain and ChatGPT agents are able to achieve a variety of goals depending on their specific use case, from helping to plan/book all aspects of a holiday to helping companies make processes more efficient – sometimes with multiple agents working together to provide a unified output.
How do we use AI Agents at Chase Labs?
When an email is received, the first thing our system does is to identify what kind of email it is. A model has been trained to recognise different kinds of responses, such as: referral (email needs to be sent to another person); out-of-office; do not contact again; and needs a response. Only emails categorised as ‘referral’ and ‘needs a response’ will be handed off to separate AI agents, as they require decision-making steps, rather than, say, ‘out-of-office’ which has a pre-determined (programmed) sequence of steps.
Focusing on the AI agent for ‘needs a response’, there are several steps it will take towards its goal of responding to the email. Step one is making a plan to effectively respond to the email. This involves deciding which ‘tool’ to use, of which the agent has many at its disposal: look in calendars to check availability; send/move/cancel a calendar invite; attach a document; search within a knowledge base; and compose a message.
Step two is taking action, using ‘tools’, to execute the plan. Each action taken will affect the ones that follow. For example, if the agent uses a tool to search for a document that has been requested in the email, the response it provides to the sender will differ depending on whether it could or couldn’t find the document.
The third step the AI agent will take is to check over what it has done to ensure everything is correct and that no further actions are needed. A crucial check at this stage could be, for example, to check that the dates suggested in the email for a meeting match the ones the agent has selected for a calendar invite. If a mistake was made, it would return to previous steps to ensure the right dates were selected, before moving on to step three once more.
When the agent has determined that no further actions need to be taken to execute its plan, the response to the email is sent, and its goal is achieved.
Chain-of-Thought (CoT) Reasoning Our agentic systems use CoT Reasoning: breaking down a specific goal into smaller steps, where at each step the AI is asked to plan, ‘reflect’ on how things are going, and then determine what the next appropriate action is. Many of the other AI agents described above who use LLMs will also perform the CoT technique to increase the accuracy of responses/outputs. |
To AI agency, and beyond
2025 has been labelled as the year of AI agents, with much hype over the development of increasingly autonomous systems. They’ve been heralded as the key stepping stone to AGI – another term with a plethora of definitions, but commonly understood as a future form of AI that surpasses human capabilities. Whether AGI is even possible is still up for debate, and deserves a separate discussion.
But AI agents have indisputably arrived and as organisations start planning for 2026 the question of “what comes next?” is increasingly posed. Autonomous AI systems are being used in proliferating ways: for research, design, content generation, and as personal assistants, to name a few. These agents are performing economically relevant work for individuals, companies and governments. And their uptake doesn’t seem to be slowing down.
However, in spite of the impressive feats achieved with AI, even the most cutting-edge systems are fallible. A major limitation with existing AI agents is their reasonably short-term ‘memory’; they can’t undertake tasks that take a substantially long time. Recent developments in the industry aim to give agents longer-term memories so that they can, for example, work on projects that span multiple days or even weeks, or large code projects with hundreds of different files that interact with each other.
AI also has the capability to ‘hallucinate’ (which is why it’s important to keep humans in the loop, rather than heading down a route of full automation). They invent things, and can bizarrely struggle on simple tasks, like word problems or mathematical questions. While tool use and tweaks in training might have fixed previous examples of “how many r’s are in the word strawberry” or multiplying big numbers, these systems still make mistakes – it’s stated in the smallprint on ChatGPT and Gemini. While helpful, they also represent one ‘voice’. They don’t hold all the answers, particularly when it comes to strategic decisions or the future.
Which is why it remains to be seen what exactly 2026 will bring in terms of more advanced agentic systems, though it’s hard to disagree with trajectories that suggest agents will become more prevalent in everyday lives and businesses. How they materialise in our lives however, is something worth considering.
A thought shared by Jason Fried, co-founder of 37signals, analogises AI as writing in the colour brown. He reflects that each writing style has its own colour. When AI’s write, they draw on their immense training data to combine all styles, resulting in brown writing. In a tangent analogy, using just AI could be compared to brown. In order to paint a masterpiece, you need a depth of colour. In other words, to create great work, you need a depth in ideas, perspectives, voices. Relying too heavily on AI might get us just that – brown.