A new company has arrived with a jaw-dropping claim: a 12-million token context window. The company is called Subquadratic, and on May 5, 2026, it officially launched with \$29 million in seed funding. Their model, SubQ LLM, uses a technique called “Subquadratic Sparse Attention.”
Before we dive into the details, let me clarify something: what does 12 million tokens actually mean?
What Does 12 Million Tokens Mean?
Let’s compare. The original GPT-4 had an 8,000 token context window. Claude 3 arrived with 200,000 tokens. Claude Opus 4.6 now has 1 million tokens. Gemini 1.5 Pro supports two million tokens.
Now Subquadratic comes along and says: 12 million tokens. That’s roughly equivalent to 30 books of 400 pages each. Or the entire codebase of a large software project. Or every email you’ve sent and received over the past 5 years.
Imagine: an AI model that can hold 30 books “in its mind” simultaneously and answer your questions about them.
What’s the Problem with Quadratic Attention?
To understand why this achievement matters, I need to explain a technical concept in simple terms.
Language models use a mechanism called “Attention.” In standard Attention (also known as Self-Attention), every token must “attend to” all other tokens. This means if you have 1,000 tokens, the model must perform 1,000 x 1,000 = one million calculations. If you have 10,000 tokens, it becomes 100 million calculations.
This is called “quadratic growth.” Every time the token count doubles, computation quadruples. For 12 million tokens using the traditional method, the computational volume becomes practically impossible.
It’s like asking everyone at a party of 10 to shake hands with everyone else — 45 handshakes needed. But at a party of 1,000? Nearly 500,000 handshakes. Practically impossible.
How Does Subquadratic Sparse Attention Work?
Subquadratic found a clever solution: instead of having every token attend to all other tokens, it only attends to the “important” ones.
The core idea is that in a long text, most tokens aren’t directly relevant to each other. When you’re reading the last paragraph of a book, you don’t need to attend to every single word in chapter one. You only need the key, relevant points.
Subquadratic uses a combination of several techniques:
Local Attention: Each token fully attends only to nearby tokens (e.g., 2,048 tokens before and after).
Sparse Global Attention: For distant tokens, a selective mechanism is used. The model has learned which distant tokens are truly important and attends only to those.
Hierarchical Compression: Long text is divided into blocks, and each block produces a “summary.” New tokens can attend to these summaries instead of every individual previous token.
The result? Computational complexity has dropped from O(n squared) to approximately O(n times square root of n). In plain terms: for 12 million tokens, instead of 144 trillion calculations, roughly 40 billion are needed. That’s a massive difference.
How Does SubQ LLM Perform?
According to benchmarks published by Subquadratic themselves:
On “Needle in a Haystack” tasks (finding a specific piece of information in very long text), SubQ LLM achieved 94.2% accuracy with 12 million tokens. For comparison, most models see their accuracy drop sharply after 2 million tokens.
Good results have also been reported for long document summarization. However, independent benchmarks have not yet been published, so we need to wait for independent researchers to verify these claims.
An important note: SubQ LLM’s output quality on general tasks (like answering questions, coding, writing) still trails GPT-5.5 or Claude Opus. Its real strength lies in managing extraordinarily large contexts.
Where Did the \$29 Million Come From?
Subquadratic’s funding round was led by Sequoia Capital. Andreessen Horowitz and several angel investors from Google Brain and DeepMind also participated.
The founding team is interesting too: three researchers from the Google Brain research group who previously worked on Attention mechanisms. These are exactly the people who know the problem intimately.
Why Does a Large Context Window Matter?
You might ask: “Isn’t 1 million tokens enough? Why 12 million?”
The simple answer is: there are use cases where 1 million tokens is not enough.
Analyzing large codebases: A medium-sized software project might contain 5 to 10 million tokens of code. With 12 million tokens, the model can see the entire project and find bugs.
Legal analysis: A large contract with all its appendices and documentation might be several million tokens. Lawyers and legal teams would find this capability invaluable.
Scientific research: A researcher wants to analyze 50 related papers simultaneously and find connections between them. With a large context window, this becomes possible.
Long-memory chatbots: Imagine having an AI assistant that remembers all your conversations over months. With 12 million tokens, this is much closer to reality.
Challenges and Criticisms
Of course, not everything is rosy. Several serious challenges exist:
First, a “large context window” doesn’t necessarily mean “deep understanding.” The “Lost in the Middle” problem still exists — the model may ignore information in the middle of the context. Subquadratic claims to have largely solved this, but until it’s independently verified, we can’t be certain.
Second, inference costs are still high. Even with Subquadratic’s optimizations, processing 12 million tokens isn’t cheap. Pricing hasn’t been announced yet.
Third, latency is an issue. When the model has to process 12 million tokens, response time goes up. For real-time applications, this can be problematic.
The Future of Context Windows
Subquadratic’s move shows the context window race is far from over. We started at 4,000 tokens and have reached 12 million. Next year, we’ll likely see even larger numbers.
But the real question is: is bigger always better? Or should we focus on quality of understanding and optimal use of context? The answer is probably both. And companies like Subquadratic are showing that you can make context larger while maintaining quality.
We’re waiting for independent benchmarks and final pricing. But one thing is certain: the era of small context windows is over.