Glossary

Context Window

A context window is the maximum amount of text (measured in tokens) that an AI model can process in a single interaction, including the conversation history, system instructions, and retrieved documents.

Share this article:

What Is a Context Window?

A context window is the total amount of information a large language model (LLM) can "see" and consider when generating a response. Measured in tokens (roughly ¾ of a word), the context window includes everything the model processes: the system prompt that defines the agent's behavior, the conversation history with the customer, any documents retrieved via RAG, and the tools and functions available to the agent.

Think of it as the AI's working memory. A model with a 128,000-token context window can hold roughly 96,000 words of information at once — equivalent to a full novel. A model with a 4,000-token window can hold about 3,000 words, or roughly six pages.

Why Context Window Size Matters for Customer Service

In customer support, context window size directly impacts the quality of AI interactions. A larger context window means the agent can:

  • Remember the full conversation history, even in long back-and-forth exchanges
  • Process extensive product documentation and policy details
  • Consider the customer's past interactions and account history
  • Handle complex, multi-part questions without losing track

A small context window forces trade-offs: the system must decide what information to include and what to drop. If a customer's question requires understanding both a detailed product manual and their three-month conversation history, a limited context window means the agent may miss critical details.

Technical context: Modern frontier models now offer context windows of 128K to 2M tokens, a dramatic increase from the 4K-8K token windows that were standard in 2023. This expansion has made it practical for AI agents to process entire knowledge bases alongside conversation context.

Context Window vs. Knowledge Base

A context window and a knowledge base serve different purposes. The knowledge base stores all of an organization's information permanently. The context window is the subset of that information loaded into the model for a specific interaction. RAG and knowledge graph techniques bridge the gap by intelligently selecting which information from the knowledge base to place in the context window for each query.

The Maven Advantage: Intelligent Context Management

Maven AGI uses a knowledge graph to ensure that the most relevant information fills the context window for every interaction. Rather than stuffing the window with everything available, Maven's retrieval system identifies precisely which policies, account details, and conversation history matter for the current query. This intelligent context management is what enables high resolution rates even on complex, multi-turn conversations.

Maven proof point: Mastermind achieved 93% live chat resolution with Maven AGI while handling 60% more contacts — demonstrating that intelligent context management scales without sacrificing accuracy.

Frequently Asked Questions

What happens when a conversation exceeds the context window?

When information exceeds the context window, older parts of the conversation or less relevant retrieved documents are summarized or dropped. Well-designed systems preserve the most important context and summarize the rest, so the agent doesn't "forget" critical details mid-conversation.

Does a bigger context window always mean better performance?

Not necessarily. Larger context windows allow more information but can also introduce noise. Research shows that models can struggle to find relevant information when it's buried in the middle of a very long context. Smart retrieval that puts the right information in the window matters more than raw window size.

How is context window size measured?

Context windows are measured in tokens. A token is roughly ¾ of an English word. A 128K token context window holds approximately 96,000 words. Tokenization varies by model and language — non-English text and code typically use more tokens per word.

Related Terms

Table of contents

Contact us

Don’t be Shy.

Make the first move.
Request a free
personalized demo.