Sandeep Giri

Domain-Focused PCA on Text Embeddings Improves Semantic Retrieval

2026-05-28T00:00:00+00:00

One of the core problems in building a reliable AI system for enterprise data is retrieval: given a user’s question, can you find the right context quickly and accurately?

Modern retrieval systems depend heavily on text embeddings — dense vector representations produced by models like OpenAI’s text-embedding-3-small. These embeddings are trained on enormous general-purpose corpora. They work remarkably well as a starting point. But “general purpose” is not the same as “domain optimal.” A model trained on the entire internet does not distribute its representational capacity the way a medical, legal or financial corpus would want it to.

The question I wanted to answer: can we improve retrieval accuracy in a specific domain without fine-tuning the embedding model — which is expensive, slow, and requires labelled data?

The answer turned out to be yes, with a surprisingly simple technique.

The Idea

Principal Component Analysis (PCA) is one of the oldest ideas in machine learning. It finds the directions of maximum variance in a set of vectors and projects everything onto those directions. Applied to text embeddings from a domain-specific corpus, it does something useful: it rotates the embedding space so that the axes align with the directions that actually matter for this domain, rather than the directions that mattered across the general training data.

The intuition is straightforward. A general embedding model uses many dimensions to represent distinctions that are important across all domains. In a medical corpus, most of those dimensions are irrelevant noise. PCA finds the dimensions that carry real signal for the documents you actually have, and discards the rest.

Crucially: we fit the PCA transform only on the document corpus, not on queries. This turns out to matter.

What We Tested

We ran experiments on a medical domain corpus covering 20 clinical topics, using OpenAI’s text-embedding-3-small as the base embedding model. We tested five hypotheses about when and why domain-focused PCA helps:

Does PCA on a domain corpus improve MAP over the raw baseline?
Is 32 dimensions the sweet spot, or does more always help?
Does fitting PCA on corpus-only data outperform fitting on corpus + queries?
Do random projections produce the same gains? (Spoiler: they do not.)
Does the improvement hold as corpus diversity increases?

The Results

The best configuration — PCA-32, fitted on the document corpus only — achieved a Mean Average Precision (MAP) of 0.9203 against a baseline of 0.8750. That is a +5.2% improvement in retrieval accuracy.

Two other numbers from the results matter practically:

2.5× increase in similarity gap — the distance between a correct match and the nearest incorrect one grows substantially. This makes retrieval more robust in real systems where you are using a threshold to decide what to include.
48× reduction in storage — going from 1536-dimensional to 32-dimensional vectors. For large enterprise corpora, this is not a minor convenience. It is the difference between retrieval that fits in memory and retrieval that requires expensive infrastructure.

The random projection control was important. When we projected embeddings onto 32 random directions instead of 32 PCA-derived directions, accuracy did not improve. The gain comes specifically from aligning dimensions with the domain — not just from reducing dimensionality. That rules out the possibility that we were simply compressing away noise randomly.

The corpus-only fitting result was also notable. Adding query vectors to the PCA fitting data slightly reduced performance compared to fitting on documents alone. The retrieval task is fundamentally about the document space; letting query variation influence the principal components adds irrelevant structure.

Finally, as corpus diversity increased across our 20-topic dataset, performance improved rather than degrading. This is encouraging for practical deployment: the technique scales with the breadth of the domain corpus rather than becoming less useful.

Why This Matters for Enterprise AI

At Terno AI, we think a lot about how to make retrieval accurate and efficient for enterprise data. The problem is not just finding relevant content — it is finding it reliably enough to trust the downstream reasoning.

Enterprise corpora are not the internet. They are specific: a legal firm’s contracts, a hospital’s clinical notes, a manufacturer’s technical documentation. The variance structure in these corpora is nothing like the variance structure of a general embedding model’s training data.

Domain-focused PCA offers a lightweight path to alignment without fine-tuning:

No labelled data required
No model training
Runs in seconds on a modern laptop
Composable with any embedding model
Interpretable — you can inspect the principal components

It fits naturally into a semantic layer architecture where you want retrieval to be accurate and fast and auditable.

The Paper and Code

The full paper is published on Zenodo:

Domain-Focused PCA on Text Embeddings Improves Semantic Retrieval: A Medical Domain Study DOI: 10.5281/zenodo.20320367

The code — including the full experiment, figure generation, and a cached embedding pipeline that avoids repeated API calls — is on GitHub:

github.com/cloudxlab/pca_embeddings

To run it yourself:

git clone https://github.com/cloudxlab/pca_embeddings
pip install openai scikit-learn numpy pandas matplotlib seaborn
export OPENAI_API_KEY=your-key
python3 pca_experiment_v2.py

Embeddings are cached locally after the first run, so subsequent experiments cost nothing in API calls.

What Is Next

This study used a medical corpus as a controlled testbed. The obvious questions for follow-up are: does the gain hold in legal, financial, and technical domains? What is the relationship between corpus size and optimal PCA dimensionality? Can the principal components themselves be used to understand what a corpus is “about” — as a form of automatic semantic layer construction?

These are the questions we are working on. If you are building retrieval systems for enterprise domains, I would be interested to hear what you find.

The paper is open access under Creative Commons Attribution 4.0. The code is MIT licensed. Both are free to use and build on.

The Clubs That Said No

2026-05-28T00:00:00+00:00

In my first year at IIT Roorkee, I wanted to belong somewhere.

Every new student feels this. You arrive at college and the world suddenly seems full of interesting people doing interesting things. There are clubs for music, drama, literature, debate, photography. You go to the trials, you put your name forward, and you wait to find out which version of yourself college is going to let you become.

I tried several. The Music club. Dramatics. The campus magazine, Watch Out. One by one, they said no.

I did get selected in the Light section — the group that managed stage lighting for events. I joined, attended a few sessions, and decided it was not what I was looking for. So I left that too.

By the end of my first year, I had been rejected by or had walked away from every student group I had tried. It was a quiet kind of failure — not dramatic, not catastrophic, just a series of doors that did not open.

The Redirect

I made a decision that I did not think much of at the time. I would stop trying to find a group to join and start learning computing as seriously as I could.

I had always been drawn to it. I spent long evenings in the computer lab. I read whatever I could find. I wrote programs, broke them, fixed them, wrote more. I had no particular goal other than getting better — not better than someone else, just better than I had been the week before.

By the time third year arrived, something had happened that I had not planned for. I had become genuinely capable.

Building IMG

A couple of seniors and classmates were thinking about the same problem I was thinking about: the institute’s website and online systems were in poor shape. The information students needed was scattered or absent. There was no good way to search for people, manage library resources, or handle placements online.

We started building.

The Notice Board. A People Search. A library management system. Recruitment management for placements. Each one was a real piece of software solving a real problem for real people on campus.

We formed a group to do this properly. We called it IMG — Information Management Group.

It was not glamorous. We were not doing anything that would have impressed the students who ran the music club or the drama society. We were fixing unglamorous things: how do you find a professor’s contact details, how do you know when a book is available, how does a company post a recruitment notice.

But the work was ours. And it was needed.

What Happened Next

IMG became the most sought-after group on campus.

I find this genuinely funny to think about now. The clubs that had no room for me in my first year remained what they always were — good clubs, doing good work, meaningful to the people in them. But the group we built from scratch, out of necessity and stubbornness, grew into something that outlasted all of us.

IMG is still running today, more than twenty years later. It is still considered the most competitive and prestigious student group at IIT Roorkee. The alumni who came through it have gone on to build companies, lead engineering teams, and do work that has reached millions of people.

None of that was planned. We were just trying to make the campus work better.

The Thing About Rejection

I want to be careful here, because I am not going to tell you that rejection is always a gift or that every closed door leads to a better one. That is not always true and it would be dishonest to pretend otherwise.

But I do think there is something specific that happened to me that is worth naming.

When every existing group said no, I had no ready-made identity to slot into. I could not become “the magazine person” or “the theatre person.” I had to figure out what I actually wanted to spend my time on — not what label I wanted, but what work I wanted to do.

And the work I chose was building things.

That choice, made mostly by default in the quiet aftermath of a series of rejections, turned out to be the most important choice of my college years. Everything that came after — the companies, the products, the teams — grew from the habit of mind I developed during those hours in the computer lab when no club wanted me.

The doors that stayed closed pushed me toward the one I did not know I was looking for.

If you are a student reading this and you have just been rejected from something you wanted: I am not going to promise it works out. But I will say — the question worth asking is not “why didn’t they want me?” It is “what do I actually want to spend the next three years getting good at?”

The Stars Are Closer Than You Think

2026-05-28T00:00:00+00:00

When I was in class 11th, I fell in love with a book.

It was Concepts of Physics by H. C. Verma. If you grew up in India and studied science, you probably know it. For many of us, it was not just a textbook — it was the first book that made physics feel like something alive, something that could be understood rather than merely memorised.

I was living in a tiny village in Uttar Pradesh at the time. I had no tutor, no coaching centre, no older student I could call on. What I had was curiosity, a stack of books, and a habit of reading the same passage ten times until something clicked.

While reading H. C. Verma, I hit a wall. There were concepts I could not resolve no matter how many times I went back to the page. I looked through other books — I used to hoard books by multiple authors on every subject, hoping that one author’s explanation would unlock what another’s had obscured. Sometimes it worked. This time, it did not.

The Letter

I am not sure exactly when the idea came to me. But at some point I thought: why not write to H. C. Verma directly?

He was a professor at IIT Kanpur. I was a schoolboy in a village. I had no connection to him, no introduction, no reason to expect anything. But I had his doubts in my head and his address — printed right there in the book.

So I wrote a letter. I described my questions as clearly as I could. I folded the pages, addressed the envelope, and posted it.

I had no real expectation of a reply. I told myself I probably would not hear back. But I had asked the question, and there was something satisfying in that alone.

Then his reply arrived.

He had written back — in Hindi, in a warm and careful hand. He addressed my questions one by one. He did not write like a busy professor brushing off a student. He wrote like someone who genuinely cared that the ideas landed. The letter was patient and kind and precise.

I have thought about that moment many times since. A professor at one of India’s most demanding institutions, receiving a letter from an unknown schoolboy in a village he had never heard of, and choosing to sit down and write back. In Hindi. With care.

Something shifted in me that day. Not just because my doubts were cleared — though they were — but because of what the gesture meant. The world was not as closed as it had seemed from where I was standing. The distance between a student and a teacher was not fixed. It was crossable. By a letter.

Bjarne Stroustrup

A few years later, when I reached college, the same instinct resurfaced.

I had become absorbed in C++. I was reading everything I could find, writing programs, trying to understand the language at a deeper level. And I found myself with questions that books did not answer — questions about design decisions, about why the language worked the way it did.

Bjarne Stroustrup had invented C++. He was at Bell Labs and later at Texas A&M, one of the most influential computer scientists alive. I wrote to him.

He wrote back. Not once — several times. In detail. He engaged with the questions as if they were worth engaging with, because to him, perhaps, any sincere question about the language he had spent his life building was worth engaging with.

I want to be careful not to overstate this. These were not long friendships or mentorships in any formal sense. They were exchanges — brief, specific, generous. But they were enough to teach me something I have carried ever since.

What I Learned

The lesson is not about networking. It is not a productivity tip or a career hack.

It is something more fundamental: most people who have built something meaningful care deeply about the ideas behind it. Ask them a sincere question about those ideas, and there is a real chance they will answer.

The famous scientist, the author whose book you loved, the founder whose work changed how you think — they are not behind a wall. They are human beings with email addresses and, sometimes, a genuine pleasure in being asked something real by someone who clearly cares.

The worst that can happen when you reach out is silence. You were already in silence before you asked. The asking costs almost nothing. The answer, when it comes, can change everything.

I have been on the other side of this now. As someone who has taught many thousands of people, I receive messages from students and learners asking questions — sometimes deeply technical, sometimes personal, sometimes just a thought they wanted to share. I try to reply to as many as I can.

Because I remember what it felt like to post that letter from a village in UP, with no confidence it would reach anyone, and then to find that it had.

The Stars

There is a line I keep coming back to, one I used when I shared this story on LinkedIn:

The stars are closer than you think.

The people whose work has shaped you — the authors, the teachers, the builders — often feel remote. Their names are on book covers and conference programmes. They exist at a distance that seems unbridgeable from wherever you are standing.

But the distance is mostly in our heads. It is made of the assumption that we are not important enough to ask, that our questions are not interesting enough to warrant a reply, that the gap between where we are and where they are is too large to cross with something as simple as a letter.

It is usually not.

Ask the question. Write the letter. Send the email.

The answer might be silence. But it might be a reply in Hindi, written by hand, from a professor at IIT Kanpur, that changes the way you see the world.

If you have a similar story — of reaching out to someone whose work mattered to you, and being surprised by what happened — I would genuinely love to hear it.

Building an Open-Source Autonomous Agent SDK

2025-11-15T00:00:00+00:00

We are at an interesting moment in the development of AI.

The underlying models — the large language models that reason, write code, and follow complex instructions — have reached a capability level where they can be genuinely useful for tasks that require multiple steps, tool use, and judgment.

But the infrastructure for building agentic systems — systems that can plan, take actions, use tools, remember context, and operate autonomously toward a goal — remains surprisingly immature.

Existing frameworks make certain things easy. But they often sacrifice transparency for convenience, or make it hard to audit what the agent actually did and why.

I have been thinking about what a well-designed open-source agent SDK would look like. Here are my current thoughts.

What Makes a Good Agent

Before thinking about the SDK, it helps to be clear about what a good agent actually is.

An agent is a system that can:

Receive a goal or task in natural language
Decompose that task into steps
Execute those steps using available tools
Observe the results of each step
Adjust its plan based on what it observes
Complete the task or report why it cannot

This sounds simple. But building a system that does this reliably — that does not hallucinate tool calls, that handles errors gracefully, that does not lose track of the original goal, that can be safely stopped or interrupted — is genuinely hard.

The hardest parts are:

Planning under uncertainty. The agent must decide what to do next without full information about whether its previous steps worked as intended, and without certainty about what future steps will be needed.

Tool reliability. Tools fail, return unexpected outputs, or produce results that require interpretation. The agent must handle this gracefully.

Context management. Long tasks require the agent to maintain coherent state over many steps. Current language models have context windows that become a bottleneck in long tasks.

Safety and auditability. For any serious use case, you need to be able to inspect what the agent did, why it made the decisions it made, and where things went wrong.

What Current Frameworks Get Wrong

Most current agent frameworks optimise for getting demos working quickly. This is understandable — demos attract attention and demonstrate capability. But it creates problems when you try to use these frameworks for production work.

Opacity. Many frameworks make it easy to run an agent but hard to understand what it is doing internally. You get an output, but you cannot easily reconstruct the reasoning chain that produced it.

Tight coupling. Frameworks often tightly couple the agent logic to a specific model provider or a specific tool interface. This makes it hard to swap components, run tests with mock tools, or migrate between providers as models improve.

Weak error handling. When a tool fails or returns unexpected output, many frameworks either crash or retry indefinitely. Production agents need explicit, configurable error handling at every step.

No audit trail. For enterprise or safety-critical applications, you need a complete record of what the agent did — every tool call, every observation, every decision. Most frameworks treat this as an afterthought.

What a Good SDK Should Provide

Based on these observations, here is what I think a good open-source agent SDK should prioritise:

Explicit, inspectable state. Every step of the agent’s reasoning and action should be represented as an explicit, serialisable data structure. You should be able to pause an agent, inspect its complete state, and either resume it or understand what it did.

Composable tools with typed interfaces. Tools should be defined with explicit input and output types, validation logic, and error handling. The agent should be able to discover available tools and understand their capabilities from their definitions.

Modular planning. The planning component should be separated from the execution component. This makes it possible to test planning logic without running tools, and to swap planning strategies without changing execution code.

First-class memory. Long-running agents need to store and retrieve information across steps. This should be a first-class concept — not an afterthought implemented by appending to a prompt.

Permission model. Different tools should have different permission levels. A tool that reads a file is different from a tool that writes one, which is different from a tool that makes a network request. The agent should require explicit authorisation for operations above a certain permission level.

Complete audit logging. Every action taken by the agent — including its internal reasoning, tool calls, observations and decisions — should be logged in a structured format that can be queried and analysed.

The Memory Problem

One of the most underappreciated challenges in building agentic systems is memory.

Language models do not have persistent memory. Everything they know about the current task must be present in their context window at the time of each call. For short tasks, this works fine. For long tasks, it becomes a serious bottleneck.

A good agent SDK should provide abstractions for different types of memory:

Working memory — the current state of the task, including the goal, the steps completed so far, and the current plan.

Episodic memory — records of past interactions that may be relevant to the current task. The agent should be able to retrieve relevant past episodes when they are helpful.

Semantic memory — general knowledge about the domain, the user’s preferences, and the tools available.

Procedural memory — learned patterns for how to accomplish common subtasks.

These do not need to be implemented as separate systems. But they need to be thought about as distinct concepts, because the criteria for what to store, when to retrieve, and how to keep them current are different for each.

Interface Diversity

I am also interested in agents that can interact through multiple interfaces: not just a chat window, but a command line, a browser, an API, or a messaging application.

A well-designed agent should be able to adapt its behaviour to the interface it is operating through. The way you express a complex analytical task through a terminal command is different from the way you express it through a conversational interface. An agent that is tightly coupled to one interaction model will be limited in how it can be deployed.

What I Am Working Toward

I do not have a complete answer yet. But I find the design space of reliable, transparent, composable agent systems genuinely interesting — and I think there is real value to be created by building infrastructure that makes it easier to build agents that work well in production, not just in demos.

If you are building in this space, I would be glad to compare notes.

This essay is part of my writing on Agentic Systems. The ideas here connect to the design work I am doing at Terno AI around secure, auditable AI workflows.

A World Without Countries

2025-11-10T00:00:00+00:00

I want to begin with a provocation.

The country you were born in — the specific rectangle of territory — has determined more about the conditions of your life than almost any other factor. More than your intelligence, more than your effort, more than your character. Where you were born determines your access to education, healthcare, safety, opportunity, legal protection and economic mobility.

You did not choose this. You cannot change it. It is a lottery at birth, with consequences that last a lifetime.

This seems like the kind of thing worth thinking about.

Nations Are Not Natural

The nation-state, as a political form, is a relatively recent invention. For most of human history, people organised themselves into much smaller units — tribes, city-states, small kingdoms — or into very large, loosely held empires that made no claim to representing a coherent national identity.

The modern nation-state — with fixed borders, a centralised government claiming sovereignty over a territory, a single legal system, and the idea that the citizens of the state share a common identity — emerged primarily in the eighteenth and nineteenth centuries.

This does not make nations illegitimate. Things can be recent and still valuable. But it does mean that the permanence we feel when we think about the nation — the sense that it is simply the natural unit of human organisation — is an illusion produced by familiarity rather than by any deep structural necessity.

What Nations Do Well

I am not arguing that nations have produced nothing of value. I think they have produced a great deal.

Nations enabled the kind of large-scale coordination that built modern infrastructure, legal systems, public education, healthcare systems and scientific research. The state creates a shared framework within which large numbers of people can cooperate with strangers — paying taxes, obeying laws, trusting that contracts will be enforced and that institutions will function predictably.

This is genuinely valuable. Coordination at scale is not easy, and the nation-state has been a reasonably effective mechanism for achieving it.

Nations have also provided, at their best, a framework for democratic accountability. The idea that citizens can, through political participation, influence the rules that govern them is one of the most important political ideas ever developed.

What Nations Do Poorly

But nations also produce costs that we often accept as inevitable when they are not.

Wars. The most catastrophic consequence of the nation-state system is its tendency to produce wars between nations. When the world is divided into territorial units with independent militaries and competing interests, conflict is a predictable consequence. The twentieth century’s wars — which killed somewhere between 100 and 200 million people — were largely products of the nation-state system’s internal logic.

Arbitrary restriction of movement. The current system severely restricts the freedom of people to live and work where they wish. A highly skilled worker in a poor country who wishes to move to a richer country faces legal barriers that a skilled worker born in the richer country never faces. This produces massive economic inefficiency and enormous human suffering.

Competitive undermining of shared challenges. Climate change, pandemic preparedness, financial contagion, and the governance of powerful technologies are all problems that cannot be solved by individual nations acting alone. The nation-state system makes global cooperation on these issues structurally difficult.

Identity-based violence. When national identity becomes associated with ethnic, religious or cultural identity, the nation-state becomes an instrument of exclusion and persecution. The history of the twentieth century contains many examples of what happens when national identity is mobilised this way.

Imagining Something Different

I am not proposing a specific alternative. I am not sure there is one obvious alternative, and I am sceptical of anyone who claims to have the blueprint for a better world order.

But I find it useful to imagine what a different system might value.

What if political boundaries were permeable to movement? What if human beings could live and work where they chose, the way goods and capital now largely move freely?

What if there were more robust mechanisms for addressing challenges that cross national borders — not just in rhetoric, but with real authority and resources?

What if political identity were based less on territory and birth and more on voluntary association with a set of shared values or institutions?

These are not new ideas. Various forms of cosmopolitanism have been argued for by philosophers for centuries. The European Union, despite its difficulties, represents a serious attempt to move beyond the pure nation-state model within a region.

But the ideas remain marginal in mainstream political discourse, in part because the people who benefit most from the current system — those born into prosperous, stable nations with strong legal protections — tend to dominate that discourse.

The Limits of This Thinking

I am aware of the objections.

Human beings form strong attachments to particular places, cultures, languages and communities. The nation-state, at its best, provides a framework for protecting and nurturing these attachments. Any replacement would need to take them seriously.

Large-scale coordination without the nation-state is genuinely hard. We do not have good models for global democratic accountability. What exists — the United Nations, international courts, multilateral agreements — works poorly and commands little genuine loyalty.

And there are serious risks in moving too quickly away from existing structures. Stability, even imperfect stability, has value. The periods when nation-states have collapsed or been violently reorganised have generally been terrible for the people living through them.

Why I Think About This

I am a technologist, not a political scientist. I am not in a position to design alternative world orders, and I am not sure I would trust anyone who claimed to be.

But I think there is value in asking the question: what do we assume is permanent because we grew up with it, and what would we think about it if we encountered it for the first time?

The nation-state is one of those things. It is not eternal. It did not always exist in its current form. It has costs as well as benefits. And there are, at least conceptually, other ways of organising collective human life.

I do not know what those ways are. But I think asking the question is worth doing — especially now, when the challenges that face humanity most urgently are ones that individual nations cannot solve on their own.

This essay is part of my Beyond Common Sense series — essays on ideas that challenge assumptions I find it useful to examine. I am not proposing political programmes. I am trying to think clearly about things that are often treated as beyond question.

What Teaching Thousands of Learners Taught Me About Intelligence

2025-11-05T00:00:00+00:00

I have watched thousands of people learn.

Not in the passive sense of attending a lecture, but in the active sense of working through a problem they have not seen before. Forming a hypothesis. Testing it. Discovering they were wrong. Revising. Trying again.

There is something you notice, if you watch enough people learn, that is not captured in how we usually talk about intelligence.

The Story We Tell About Intelligence

The standard story about intelligence goes roughly like this.

Some people are good at mathematics and logical reasoning. Others are better at verbal things. Some people “get” technical concepts quickly. Others find them difficult. These differences are relatively stable — your capacity for technical thinking is something you have, like the colour of your eyes.

This story shapes how learners approach unfamiliar material. The person who was told at some point that they were “not a maths person” approaches a new quantitative concept not with curiosity but with pre-emptive defeat. They expect to fail before they begin.

And the tragedy is: this expectation is self-fulfilling. Not because they genuinely lack the capacity, but because the expectation changes their behaviour in a way that produces the predicted failure.

What I Actually See

After teaching thousands of learners in AI, machine learning and data science, I have come to believe the standard story is almost entirely wrong about most people.

What I actually see is this:

Lack of confidence masquerades as lack of ability. When a learner says “I’m not good at this,” what they almost always mean is “I have not yet had an experience in this domain that made me feel capable.” The incapacity is not cognitive. It is historical.

Most people can learn technically demanding material when given the right conditions. The conditions that matter are: a pace appropriate to where they currently are, problems that are approachable but genuinely challenging, and feedback that helps them understand their errors rather than just marking them wrong.

The moment of real learning feels like surprise. Not the mild surprise of getting an answer right, but the deeper surprise of suddenly seeing something you did not see before. When I ask learners to describe what learning actually feels like when it is working, they often use words like: I suddenly saw it, something clicked, it felt like a fog clearing.

Confidence compounds. A learner who successfully works through one challenging problem approaches the next one differently. The evidence of their own capability changes their internal model of what they can do. And that changed model makes them more likely to persist, which makes them more likely to succeed, which generates more evidence of capability.

This is why the moment of discovery matters so much. A learner who discovers something for themselves does not just gain knowledge. They gain a data point about their own capacity to think.

A Specific Memory

One interaction has stayed with me more than most.

A learner — a working professional, probably in her thirties — sent me a message after completing one of my courses. She wrote that she had spent her whole life believing she was simply not good at mathematics.

She described how this belief had shaped her choices. She had avoided quantitative courses. She had apologised for herself when numbers came up in professional contexts. She had come to accept it as a fixed fact about herself.

After the course, she said something had shifted. She was not suddenly a mathematician. But she had had the experience of working through problems she had previously considered beyond her, and of actually succeeding. She could feel that the boundary she had accepted was not real.

She ended by saying she had started teaching mathematics to her young niece — not because she had become an expert, but because she had developed enough confidence in her own thinking to trust herself to guide someone else.

That message changed how I think about education.

What This Means for How We Teach

If intelligence is more about confidence and accumulated experience than about fixed capacity, then the most important thing an educator can do is not to deliver information efficiently. It is to create conditions for genuine discovery.

The learner who discovers something has a different relationship to that knowledge than one who received it. They know, through direct experience, that their mind worked on something hard and found an answer. That experience is evidence they can draw on in every future encounter with something difficult.

Conversely, the learner who is always receiving pre-formed knowledge has no evidence that their own mind can do the work. Even if they can recall and apply what they were taught, they have no reason to trust themselves in the face of something new.

The Implication for AI and Education

I am often asked what I think about AI tutors and AI-based education.

My view is: it depends entirely on what model of intelligence the AI system is built on.

An AI that treats learning as information transfer — that answers every question immediately, that provides hints before the learner has struggled, that optimises for correct answers rather than productive struggle — will be efficient and useless in the way that bad lectures are useless.

An AI that understands that the goal is the development of a learner’s confidence and self-trust — that knows when to be quiet, that recognises the moment before discovery and does not disturb it, that celebrates the learner’s reasoning rather than just their answers — that AI would be genuinely valuable.

The technology to build the second kind of AI exists. What is needed is the right model of what learning actually is.

I write about education at the intersection of technology and human development. The Learning by Inventing section of this site explores these ideas in more depth.

Why LLMs Should Generate Code Instead of Guessing Answers From Data

2025-11-01T00:00:00+00:00

Let me describe two ways an AI can respond to the question: “What was our best-performing product category last quarter?”

Approach One: The AI reads the question, accesses some information about your business, and generates a response: “Based on available information, your Electronics category showed the strongest performance last quarter, with revenue growth of approximately 18% compared to the prior period.”

Approach Two: The AI generates the following code, executes it, and returns the result:

import pandas as pd

df = query_database("""
    SELECT
        category,
        SUM(revenue) as total_revenue,
        SUM(revenue) - LAG(SUM(revenue)) OVER (PARTITION BY category ORDER BY quarter) as qoq_growth
    FROM sales
    WHERE quarter = '2025-Q3'
    GROUP BY category
    ORDER BY total_revenue DESC
    LIMIT 5
""")

best_category = df.iloc[0]
print(f"Best category: {best_category['category']}")
print(f"Revenue: {best_category['total_revenue']:,.0f}")
print(f"QoQ growth: {best_category['qoq_growth']:,.0f} ({best_category['qoq_growth']/df.iloc[0]['total_revenue']*100:.1f}%)")

The difference between these two approaches is not stylistic. It is fundamental.

What Approach One Is Actually Doing

Approach One uses a language model’s text generation capability to produce something that resembles an answer. The model has learned, from training data, what analytical answers look like. It knows that such answers typically contain category names, percentages, comparisons to prior periods, and confident-sounding phrasing.

But here is what the model cannot do: it cannot know whether its output is correct.

The number “18%” was generated because it sounds plausible in this context. The word “Electronics” was selected because it sounds like a plausible best-performing category. The model has no mechanism to verify these claims against actual data.

This is what we mean when we say that language models hallucinate. It is not a bug that can be fixed with a better prompt. It is a property of the underlying approach. Text generation produces text that is plausible given its training, not text that is verified against the actual state of the world.

For answering general knowledge questions, this limitation is manageable. For answering questions about your specific business data, it is not.

What Approach Two Is Actually Doing

Approach Two uses a language model’s code generation capability to produce an analytical program. The program is then executed against the actual data.

This changes the reliability property completely.

The answer now comes from running the analysis. The number returned is the actual number in the database. The category returned is the category that actually has the highest revenue. These facts are computed, not generated.

The language model can still make errors — it might write syntactically incorrect code, or make incorrect assumptions about the database schema. But these errors are of a completely different character:

They are often visible. Code that is wrong frequently fails to run, rather than running and producing a plausible-looking wrong answer.
They are auditable. You can inspect the code and see exactly what analysis was performed. A data analyst can review the SQL and confirm that it answers the intended question.
They are reproducible. Running the same code against the same data will always produce the same result. The answer is not different every time you ask.
They are correctable. If the code is wrong, you can fix it. If a text generation output is wrong, you have no mechanism for correction other than asking again and hoping the next generation is better.

Why This Matters for Enterprise Use

Imagine a product manager making a headcount decision based on analysis of customer acquisition data. Or a finance leader using AI-generated analysis to inform an investment decision. Or a supply chain team using AI to forecast inventory needs.

In these contexts, the difference between an answer that is computed and an answer that is generated is not academic. It is the difference between reliable intelligence and sophisticated-sounding noise.

The enterprises that will benefit most from AI analytics are not those with the highest tolerance for occasional errors. They are those who can apply AI to decisions that matter — which requires a level of reliability that text generation cannot provide.

The Code Quality Challenge

I want to be honest about the limitations of the code generation approach as well.

Generating correct analytical code is harder than generating correct text. Language models can produce code that is syntactically valid but semantically wrong — querying the right table but computing the wrong metric, applying the right function but to the wrong column, producing results that look plausible but measure the wrong thing.

Addressing this requires several things working together:

A rich semantic layer. The model needs to know what the data actually means — not just the schema, but the business context. What does “revenue” mean in your system? Is it net or gross? Does it include refunds? Are there edge cases in how certain categories are classified?

Schema validation. The generated code should be checked against the actual schema before execution to catch references to non-existent tables or columns.

Output validation. Where possible, the results should be checked for plausibility — are these numbers in the expected range? Do the totals add up correctly?

Iterative refinement. When code fails or produces unexpected output, the system should be able to diagnose the problem and generate a corrected version.

None of these are fully solved. But they are solvable in a way that the fundamental reliability problem of text generation is not.

The Right Tool for the Right Problem

I am not arguing that text generation is useless for analytical work. There are contexts where it is exactly right: summarising results, explaining analysis in plain language, generating hypotheses to investigate, answering questions about general methodology.

But for the core task of answering specific factual questions about enterprise data, code generation and execution is the right approach. Not because it is more impressive, but because it is more honest about what AI can and cannot reliably do.

The goal is not an AI that always sounds confident. The goal is an AI that is genuinely trustworthy — that fails visibly when it fails, that produces verifiable results when it succeeds, and that gives the humans who use it real information rather than the appearance of it.

That is the kind of AI that enterprises actually need.

This is part of my ongoing writing on building Terno AI. Previous essays in this series cover why I am building an AI data scientist and enterprise AI security.

Can an AI Teacher Help a Child Discover the Pythagorean Theorem?

2025-10-25T00:00:00+00:00

I want to try something in this essay that I do not usually do: I want to show you a conversation.

Not a conversation that happened, but one that I have been designing in my head for a long time. A conversation between an AI teacher — built on the philosophy I call Learning by Inventing — and a child learning about the Pythagorean theorem.

I want to walk through this conversation not just to describe the idea, but to make it concrete. Because I think the gap between “teaching through discovery” as an abstract philosophy and “this is exactly what the teacher says and what happens next” is where most attempts at this approach fail.

The Conversation

Child: I want to learn about triangles.

AI Teacher: Great. Let me give you something to do first, before we talk much. Can you draw a square on a piece of paper — any size you like?

Child: [draws a square]

AI Teacher: How many small squares fit inside it, if each small square is one unit wide? Count them.

Child: Sixteen. It’s four by four.

AI Teacher: So the area is sixteen. Now can you draw another square — smaller this time?

Child: [draws a three by three square] Nine squares.

AI Teacher: Good. Now here’s something to try. Can you cut out both squares, and see if you can arrange them — without overlapping — inside a larger square? Can you make a bigger square from two smaller ones?

[The child tries. It doesn’t quite work with 16 and 9.]

AI Teacher: What happens when you try?

Child: It almost works, but not quite. There’s a gap in the corner.

AI Teacher: Let me ask you something different. What if I told you that with the right two squares, you can always make a bigger square — but the sizes of the three squares have to be related to each other in a very specific way?

Child: What way?

AI Teacher: That’s what I want you to discover. Let me give you a different pair of squares to try: one with area 9 and one with area 16. But this time, arrange them together in an L-shape. Can you see what bigger square you could make?

This is only the beginning of the conversation, but you can already see what is different about it.

What Makes This Different

The AI teacher in this conversation has not explained anything yet. It has not said the words “Pythagorean theorem.” It has not drawn a right triangle. It has not written any equations.

What it has done is given the child something to do. And the sequence of what to do is designed to make the child encounter the key relationship on their own.

When children cut squares and try to fit them together, they are building an intuition. They are feeling the relationship between area and sides. They are noticing that some combinations work and some do not. The frustration of the combination that doesn’t quite fit is, paradoxically, one of the most valuable moments — because it means they are not just following steps, they are genuinely grappling with the problem.

What the AI Teacher Must Know

Building an AI teacher that can do this well is technically harder than building an AI that can explain the Pythagorean theorem.

An AI that explains can follow a script. An AI that teaches through discovery must:

Maintain a model of where the learner currently is. Not just what they know, but what they are attending to, what they have just tried, and where their understanding is currently incomplete.

Know when to be silent. If the child is working through a problem, the teacher must not interrupt. The productive struggle is the learning. Breaking it prematurely by offering hints is one of the most common errors in discovery-based teaching.

Know when and how to offer a nudge. When a child is stuck in a way that is no longer productive — genuinely confused rather than productively grappling — the teacher must find the smallest possible intervention that redirects attention without removing the work from the learner.

Recognise the moment of discovery. When the child says “I think I see it” — or something close to that — the teacher must notice this and respond in a way that helps the child articulate their own understanding.

This last point is the most important. The moment of discovery is fragile. If the teacher immediately validates it too enthusiastically, the child may stop exploring. If the teacher ignores it, the moment passes. The right response is something like: “What do you see? Tell me in your own words.”

The Design Problem

I have been thinking about this AI teacher for several years.

The hardest design problem is not the AI capability — current language models are good enough to have sophisticated conversations and to generate appropriate exercises.

The hardest design problem is the curriculum of discovery.

For any given concept — the Pythagorean theorem, the idea of a derivative, the intuition behind gradient descent — someone needs to design the sequence of problems that leads a learner toward it. This sequence must be:

Simple enough that each individual step is approachable
Designed so that working through the steps makes the key pattern visible
Flexible enough to respond to learners who go in unexpected directions
Calibrated to the age and background of the learner

This design work is, I think, some of the most intellectually demanding work in education. It requires a very deep understanding of the concept itself — not just the final form, but the path of understanding that leads there.

What I Am Building Toward

I believe an AI teacher that genuinely teaches through discovery is one of the most valuable things we could build with current AI technology.

Not because it would replace human teachers — I do not believe it would or should — but because it would make the experience of guided discovery available to learners who do not currently have access to teachers who can offer it.

Many of the most effective discovery-based teaching experiences happen in one-on-one or small-group settings with a skilled human teacher. That kind of teaching cannot be scaled to reach hundreds of millions of learners.

An AI that could do something approaching this — while a human teacher focuses on the social, emotional and motivational dimensions that AI cannot yet handle — might genuinely change the quality of education available to people who currently have access only to lectures and textbooks.

That seems worth pursuing.

This essay is part of my ongoing thinking about Learning by Inventing. If you are an educator, researcher or builder interested in these ideas, I would be glad to connect.

Why Enterprise AI Must Be Secure by Design

2025-10-20T00:00:00+00:00

Here is a scenario that I think about often.

A company deploys an AI assistant connected to its internal data. Employees start asking it questions. The system is useful and popular. Then, one day, someone asks a question that the system was never designed to handle — and it answers anyway, drawing on data that the questioner was never supposed to access.

No malicious intent. No external attacker. Just an AI doing what it was designed to do — helping the user — in a situation where helping meant revealing something it should not.

This is not a hypothetical risk. It is a category of failure that any organisation connecting AI to its data must take seriously. And I think the industry has not yet fully reckoned with it.

The Demo vs. Production Gap

AI applications for enterprise data analytics face a specific and underappreciated challenge: the gap between what works in a demonstration and what is safe in production.

In a demonstration, you show the system answering questions correctly on a carefully chosen dataset. The data is clean. The questions are well-posed. The access rules are simple. Everything works as expected.

In production, the data is messy. The questions are unexpected. The users have varying levels of access rights. There are sensitive columns and confidential rows. The system operates continuously and handles requests from many users with different roles.

Many AI applications that look excellent in demos fail in one or more of these dimensions when deployed for real enterprise use.

What Genuine Enterprise Security Requires

When I talk about secure enterprise AI, I mean something more specific than just encrypting data in transit or requiring authentication to use the system.

I mean the following:

Data should stay in your environment. The most sensitive enterprise data should not need to leave the company’s security perimeter to receive AI analysis. A system that requires all data to flow to an external API creates both privacy risk and compliance risk. Terno AI is designed to support deployment within the customer’s own private cloud or on-premises environment.

Access controls must be deterministic, not AI-dependent. Here is a subtle but important point. If your system relies on an AI model to decide whether a user is allowed to see a particular piece of data, you are trusting a probabilistic system with a binary security decision. AI models can be prompted, confused, or led astray. Access to sensitive data must be enforced through deterministic rules — not through another model’s judgment.

Code execution must be sandboxed. When AI systems can execute code, there is potential for that code to interact with systems beyond its intended scope. A well-designed analytical AI should execute code in a strict sandbox — with defined resource limits, no ability to make network requests, no file system access beyond what is explicitly permitted, and isolated from other users’ computation.

Audit logs must be maintained. In enterprise contexts, it is not sufficient for a system to be secure — it must also be auditable. You need to be able to answer: who asked what, when, what analysis was run, and what was returned. This is both a compliance requirement and a practical necessity for investigating anything unexpected.

Security must cover the data, not just the interface. A system can require login and still leak data through its outputs. If an AI will answer any natural language question by summarising data, a sophisticated user can effectively reconstruct sensitive information through a series of carefully posed queries. Real security requires thinking about what information can be inferred from outputs, not just what data is directly returned.

The Semantic Layer and Security

There is a connection between data security and the semantic layer that I have come to find important.

A well-designed semantic layer does not just improve accuracy. It also provides a natural place to enforce data governance rules.

When the system has an explicit, maintained model of what each piece of data means — what tables exist, what columns contain, what data is sensitive, what access a particular user role should have — security rules can be expressed and enforced at the level of meaning rather than just at the level of raw tables and columns.

A user who should not see salary data should be prevented from accessing any analysis that would reveal it — even indirectly, even through an aggregation that seems innocuous, even through a question that does not mention “salary” but could reconstruct the information.

This kind of protection is much harder to implement when security is bolted on to a system as an afterthought. It requires being designed in from the beginning.

The Trust Problem Is the Hard Problem

I believe the central challenge of enterprise AI is not capability. Current AI systems can perform impressive analytical work.

The central challenge is trust.

For an enterprise to genuinely rely on an AI system for analytical work that drives real decisions, they need to trust that the system will be correct when it matters, that it will not leak data it should not reveal, that its behaviour is predictable and auditable, and that it will fail safely and visibly rather than silently and dangerously.

Building AI systems that deserve this trust requires making different choices at every layer of the design — from how data is stored and accessed, to how code is generated and executed, to how outputs are validated before they are returned.

This is harder than building impressive demos. But it is the only kind of enterprise AI that is genuinely worth building.

This essay is part of my ongoing writing about what I am learning while building Terno AI. For more on the technical approach, see my essay on why AI should generate code rather than guess answers.

From IIT Roorkee to Terno AI: My Journey Through Technology and Teaching

2025-10-15T00:00:00+00:00

I am a Chemical Engineer who became a software engineer. A software engineer who became an entrepreneur. An entrepreneur who became a teacher. And a teacher who returned to building products.

That is the short version. The longer version is more interesting — not because of where I ended up, but because of how each chapter prepared me for the next in ways I did not anticipate.

IIT Roorkee: Learning to Build

I arrived at IIT Roorkee to study Chemical Engineering. What I actually spent most of my time doing was computing.

I want to be precise about this: it was not that I was a reluctant engineer. I was genuinely interested in the curriculum. But I was more interested in computers. I spent hours learning to program, building small systems, solving problems. I read widely and stayed up late working through things I had not been taught in class.

Looking back, this was my first experience of a truth I would return to many times: the things you choose to spend your discretionary time on usually point toward where you should go.

The other significant thing that happened at IIT Roorkee was co-founding IMG — the Information Management Group. We built and maintained online systems and resources that the whole campus depended on. This was my first experience of building something real for real users.

What I remember most is the feedback loop. When something broke, people noticed immediately. When something worked well, you could see people using it. Building for a community of users who actually depend on what you built is a very different experience from building for a course assignment.

D. E. Shaw: Learning to Think Rigorously

My first job after graduation was at D. E. Shaw, then one of the most technically sophisticated firms in the world. This was in the early 2000s, before the vocabulary of data science or machine learning had entered mainstream usage.

What I took from D. E. Shaw was not a specific set of technologies — those changed quickly. What I took was a standard of rigour.

There was a culture there of thinking very carefully before concluding anything. Of not accepting an answer simply because it seemed right or because it was fast to arrive at. Of being sceptical of your own reasoning. Of building things with the assumption that they would be scrutinised.

I still carry this. When I am building something — whether code, an argument or a product feature — I hear an internal voice asking: are you sure? Have you checked this? What are you assuming?

Amazon: Learning Scale

At Amazon, I worked on the product detail page — one of the most-visited pages in internet commerce — and on recommendation systems and image selection frameworks.

Scale changes the nature of engineering problems in interesting ways. When something runs for millions of users, the difference between a decision that is right 95% of the time and one that is right 99% of the time is enormous in absolute terms. Small gains compound. Small errors multiply.

I also learned something about customer thinking at Amazon that has stayed with me. There was a genuine insistence on starting from what the customer actually needs, rather than from what was technically interesting or convenient to build. This sounds obvious. In practice, it is surprisingly hard to maintain.

InMobi: Learning Data at Scale

At InMobi, I worked with very large datasets — recommendation systems operating on hundreds of terabytes. This deepened my understanding of the practical challenges of building machine learning systems in production.

Reading papers about machine learning is one thing. Building systems that produce useful outputs on real data, with real latency constraints, with real distribution shift as the world changes — that is something else. The gap between the theoretical ideal and the practical working system is always larger than it looks from the outside.

tBits: Learning to Build a Company

Founding tBits was my first experience of the full cycle of company building.

Technical problems have right answers — or at least, better and worse approaches that can be evaluated objectively. Business problems often have no right answer. There are decisions that must be made with incomplete information, under pressure, with real consequences for real people.

The most important thing I learned at tBits was about people. Hiring, mentoring, creating conditions where talented engineers could do their best work — this is a skill that does not come from studying computer science. It comes from paying attention to people and taking seriously your responsibility toward them.

Several of the engineers I hired at tBits have gone on to build strong careers. That matters to me more than the technical achievements of the company itself.

CloudxLab: Learning What Learning Actually Is

CloudxLab changed me more than any other chapter.

I expected to be a teacher who knew the material well and could explain it clearly. What I discovered was that knowing the material and being able to explain it clearly is not enough — and sometimes not even the most important thing.

The learners who transformed most were not the ones who sat through the best explanations. They were the ones who struggled productively and experienced the moment of understanding something for themselves.

This is what led me toward what I now call Learning by Inventing. The realisation that the teacher’s job is not primarily to deliver information — it is to design conditions in which the learner discovers it.

Teaching hundreds of thousands of people also gave me something that no amount of reading about education could have given: a deep empirical sense of how people learn, where they get stuck, what creates confidence and what destroys it.

Terno AI: Bringing It All Together

When I started thinking about Terno AI, I could trace a line from every earlier chapter.

The rigour from D. E. Shaw told me that plausible-sounding was not good enough. The scale thinking from Amazon told me to think carefully about reliability at production levels. The data systems work from InMobi told me the gap between demo and production is real. The company-building at tBits told me that the user matters more than the technology. And teaching at CloudxLab told me that things need to work in the hands of real people with real needs, not just in ideal conditions.

Terno AI is an AI data scientist for enterprises. But in a sense, it is built on everything I have learned about how people and technology interact — which is to say, on twenty years of increasingly specific and hands-on work.

What I Would Tell My Earlier Self

Looking back, a few things stand out.

The work you do in your discretionary time usually tells you what you care about. I spent my free time at IIT Roorkee programming when I could have been studying Chemical Engineering more diligently. I should have listened to that signal sooner.

Every chapter teaches you something that the next chapter needs. I could not have built CloudxLab without what I learned at Amazon about scale, or without what I learned at tBits about building a company. I could not have built Terno AI without what CloudxLab taught me about what real users need.

Teaching is a form of learning. The best way to discover the limits of your own understanding is to try to guide someone else toward it.

And perhaps most importantly: the interesting questions are usually at the intersection of things. My most productive work has always been where technology meets human need, where engineering meets teaching, where product meets idea.

This essay is part of my writing on the Founder Journey. If you are interested in the technical side of what I am building now, the Building Terno AI essays go deeper into the product and engineering thinking.

Why I Am Building an AI Data Scientist

2025-10-10T00:00:00+00:00

I want to tell you about a problem I have seen repeatedly.

A business leader wants to understand something about their company. Maybe it is why churn increased last quarter. Maybe it is which customer segment is driving most of the growth. Maybe it is whether the promotional campaign actually worked, after controlling for seasonality.

These are not unusual questions. They are exactly the kind of question that good businesses ask constantly.

Now watch what happens next.

The leader opens their BI dashboard. They find some charts. The charts show overall trends but not the specific slice they need. They submit a request to the data team. The data team is busy. Three days later, a report comes back. But the question has slightly changed — and the report does not quite answer the new version.

This cycle plays out thousands of times a day in organisations around the world.

The Wrong Tool for the Job

The standard response to this problem has been dashboards. Build more dashboards. Add more pre-built reports. Create more self-service analytics tools.

I think this is the wrong tool for the problem.

Dashboards are good at showing you what you already know you want to see. They are built around anticipated questions. But the most valuable analytical work is almost always about unanticipated questions — the question that occurs to you in the middle of a meeting, the follow-up that emerges from an answer, the investigation that was triggered by something unexpected.

For that kind of work, a dashboard is useless.

The AI Promise — and the AI Problem

The obvious modern answer is: use AI. Ask your data a question in plain English.

And to be fair, there are now many tools that let you do something like this. You type a question, the AI generates SQL or a summary, and an answer appears.

But there is a problem with this approach that I became increasingly uncomfortable with the more I thought about it.

When a language model reads your question and produces an answer, it is doing text generation. It is producing text that sounds like the right answer to your question, based on patterns it learned during training. It has learned, in some sense, what analytical answers look like.

But there is a fundamental difference between text that resembles an answer and an answer that is actually correct.

A language model can write “revenue grew 23% in Q3” with exactly as much confidence whether the actual number is 23% or 7% or -4%. It cannot distinguish between a statistically sound conclusion and a plausible-sounding fabrication, because it is not doing statistics. It is doing writing.

For most applications, this is an acceptable limitation. But for business decisions — for the kind of analysis that drives headcount decisions, investment choices, product roadmaps — it is not acceptable.

A Different Approach

The approach I am taking with Terno AI starts from a different premise.

Rather than asking a language model to produce an answer, I am asking it to produce code that computes the answer. The code is then executed in a secure, sandboxed environment against the actual data. The result is computed, not composed.

This changes the reliability property completely. The answer comes from running the analysis, not from generating text. You can inspect the code. You can verify the logic. You can reproduce the result. And if the code is wrong, it will often fail visibly rather than silently — which is itself useful information.

This is what I mean when I say Terno AI is an AI data scientist rather than an AI answer-generator. It is designed to actually perform the analysis.

The Semantic Layer Problem

There is a second, less obvious challenge that I have become increasingly focused on.

Real enterprise databases are messy in ways that make naive AI applications fail.

Table names were chosen by engineers, not by business users. Column names are often abbreviations or legacy codes. Business concepts are not documented anywhere in the schema. The relationship between what a user means by “active customer” and what the database actually contains requires knowledge that is not in the data itself.

I have come to believe that an AI data scientist that ignores this problem will produce unreliable results even if the underlying analysis engine is perfect. The analytical code might be syntactically correct but semantically wrong — computing the right operation on the wrong thing.

Terno AI addresses this through what I call a semantic layer: a maintained, structured understanding of what the data actually means. What tables represent. How columns map to business concepts. What business formulas mean. Which terms users commonly use and what they refer to.

This is unglamorous work. But I have become convinced it is the difference between a demo that looks impressive and a product that actually works reliably.

Why Now

Two things have come together to make this tractable now.

The first is the quality of current language models. The ability to generate analytically correct code from a natural language description of a task has improved dramatically in the last few years. It is not perfect — but it is good enough to be genuinely useful, and it is improving rapidly.

The second is the broader enterprise appetite for AI that actually works. There is a growing frustration, in my view, with AI applications that are impressive in demonstrations but unreliable in production. Enterprises are looking for AI they can trust for real work — and they are willing to pay for it.

The Larger Question

I am building Terno AI partly because I think it is commercially valuable. But I am also building it because I think it addresses a real problem in how organisations make decisions.

Most organisations have data. Few of them can ask real questions of it. The people who can ask — skilled data analysts and data scientists — are expensive, scarce and slow compared to the rate at which analytical questions arise.

An AI data scientist that can be trusted to perform real analysis — securely, correctly, reproducibly — is not just a convenience. It is a capability that changes the nature of how organisations can think about their work.

That seems worth building.

To learn more about how Terno AI approaches enterprise analytics, visit the Terno AI page. For the technical thinking behind secure, reliable AI analytics, read my next essay on why AI must generate code rather than guessing answers.

AI Can Do a Lot. But Can It Think Beyond Common Sense?

2025-10-05T00:00:00+00:00

I want to begin with a thought experiment.

Imagine training a language model on all the text produced in England in the year 1800. The model learns language, facts, customs, arguments, newspapers, laws, philosophy and literature. It becomes very good at generating text that sounds like 1800.

Now ask it: what do you think about slavery?

The model will likely produce something that reflects the dominant views of the period. Because those views — repugnant to us now — were common sense in 1800. They were woven through the legal system, the economy, the literature, the science, the church sermons and the political speeches. They were what people said and wrote and assumed.

The people who argued against slavery in that era were, by definition, arguing against common sense. They were not extrapolating from dominant patterns. They were imagining something that had not yet existed.

What Language Models Are Good At

I do not want to minimise what language models can do. It is extraordinary.

They can write code, summarise research, answer questions across a remarkable range of domains, reason through problems, explain complex ideas clearly and translate between languages with impressive accuracy. They have absorbed an enormous amount of human knowledge and can deploy it flexibly.

This is genuinely useful. In my own work building Terno AI and teaching at CloudxLab, I see daily how much AI can accelerate work that previously required significant time and expertise.

But what a language model is doing, fundamentally, is pattern completion. It has learned an extraordinarily rich and nuanced model of what human expression looks like — what comes after what, what ideas tend to appear together, what kind of text fits what kind of context. It is, in a deep sense, a machine that has learned common sense from the accumulated record of human expression.

That is its power. But it is also a limit.

The Ideas That Changed Everything Were Not Common Sense

Let me give a few examples.

Gandhi’s ahimsa. Non-violent resistance as a political strategy was not an obvious extrapolation from the dominant models of political conflict in the early twentieth century. Most political conflict — including anti-colonial resistance — assumed that power required force. Gandhi’s insight was that there was a different kind of force. This was not common sense. It was moral and strategic imagination that went against almost everything the existing evidence suggested.

The abolition of slavery. The argument that people could not be owned as property ran directly against centuries of law, custom, religion, economics and received wisdom. The people who made this argument were not slightly adjusting the existing consensus. They were insisting on something that the world had not yet accepted.

The germ theory of disease. When Semmelweis argued that doctors were transmitting illness by not washing their hands, he was rejected by the medical establishment. The idea that invisible organisms caused disease contradicted what was considered obvious and was professionally dangerous to assert.

These were not insights that could have been generated by averaging existing knowledge. They required someone to step outside the prevailing pattern and see something that the pattern obscured.

AI and the Amplification of What Already Is

There is a specific risk that concerns me about AI systems that are primarily trained on what humans have already said and written.

Such systems are very good at reinforcing existing patterns. They are good at producing things that fit within the space of what has already been done. They are, by design, trained to be coherent with their training data.

This makes them invaluable for tasks where the goal is to produce something that fits a known pattern: code that follows conventions, text that sounds professional, summaries that capture the main points.

But if we ask them to help us think about questions where the right answer requires going beyond the existing pattern — questions about justice, about the structure of society, about what human flourishing could look like — we should be careful about taking their outputs as authoritative.

The AI trained on dominant patterns will tend to produce dominant-pattern answers.

What This Means for Human Beings

I do not think this is an argument against AI. It is an argument about how to use it.

The extraordinary capability of AI systems to handle patterns, to process information, to generate and refine text — all of this creates space for human beings to do something different. To do the things that pattern completion cannot do.

What are those things?

Questioning the assumptions that hold patterns in place. The person who says: wait, why do we assume that? Why does this seem obvious? What would the world look like if this were different?

Imagining what has not yet existed. Not extrapolating from data, but reaching for something that has not yet been instantiated — in technology, in art, in social organisation, in science.

Making moral judgements that go beyond what consensus permits. The recognition that something currently accepted is wrong — not because the data says so, but because something deeper says so.

These are not small things. They are perhaps the most distinctively human things.

And here is the interesting possibility: as AI takes over more and more of the work of pattern application — of doing what has already been done, of producing what fits existing templates — perhaps it frees more human capacity for the harder and stranger work of imagining what could be different.

The Question I Keep Returning To

I do not know whether AI systems will eventually be able to do this harder work. Perhaps sufficiently powerful systems, trained on sufficiently diverse data, will develop some capacity for genuine novelty.

But even if they can, the question I want to keep asking is: what should we be doing?

What is the version of human intellectual and creative life that is called forth by a world where the routine work of pattern application has been automated?

I think the answer involves taking more seriously — not less — the capacity to question, to imagine, to dissent, to dream.

AI can help us with what exists. We must remain responsible for what does not yet exist but should.

This essay is part of a series on AI and human thinking. If these ideas interest you, you might also enjoy my writing on Learning by Inventing — a related question about how discovery, rather than instruction, creates original thinkers.

Learning by Inventing: The Most Powerful Teaching Discovery I Have Made

2025-10-01T00:00:00+00:00

There is a moment in teaching that I have come to treasure above all others.

It is not the moment when a learner correctly recalls a formula. It is not the moment when someone scores well on a test. It is the moment when a learner looks up from their work with a slightly surprised expression and says: I think I see it.

That pause. That look. That is what I am always trying to create.

The Problem with How We Usually Teach

Most education begins at the wrong end.

A student opens a textbook and finds: The Pythagorean theorem states that a² + b² = c². An explanation follows. Some worked examples. Then practice problems.

This approach is efficient. In forty minutes, you can deliver a concept to thirty people. But something important is missing: the experience of discovery.

When a formula is handed to you, you receive it as a package you must store and retrieve. You may understand it intellectually. But you do not know it the way a geologist knows rock — through touch, through repeated encounter, through being surprised by it in the field.

More importantly: when you receive a pre-formed answer, you are implicitly told something about yourself. You are told that knowledge comes from outside you. You are positioned as a recipient. And after years of this experience, many people come to believe a genuinely harmful thing: that they are simply not the kind of person who understands mathematics, or physics, or algorithms.

I have seen this belief in adult learners over and over again. Intelligent, capable people who are certain, before they begin, that the material will be beyond them.

A Different Starting Point

What happens if we begin differently?

Instead of stating the Pythagorean theorem, give a learner a piece of squared paper. Ask them to draw a right triangle. Ask them to draw a square on each side. Ask them to count the squares inside each shape.

They do this for a triangle with sides of 3, 4 and 5. They count 9, 16 and 25.

Ask them: do you notice anything?

Some will not see it immediately. That is fine. Give them another triangle — 5, 12 and 13. They count 25, 144 and 169. Ask again.

Something happens. A hesitant observation: the big one always seems to equal the other two together?

Yes. Precisely. Now they have made a discovery.

Their relationship to this idea is completely different from someone who read it in a textbook. They found it. It is theirs.

What Discovery Does That Instruction Cannot

Discovery creates several things that instruction alone cannot reliably produce.

Understanding of the why. When you discover a pattern, you immediately want to know why it is true. Your curiosity is already pointed in the right direction. An explanation at this moment lands completely differently from an explanation that precedes the observation.

Memory. Things we discover are remembered differently from things we are told. The moment of surprise, the sequence of problems, the sensation of recognition — these are emotionally and cognitively vivid in a way that passive reception rarely is.

Confidence. This is the most important one. When a learner discovers something, even something small, they experience their own mind working. They feel what it is like to figure something out. This is radically different from being told something correct.

There is a story I return to often. A learner who had spent years believing she was simply not good at mathematics sent me a message after one of my courses. She said she had gained enough confidence in her own thinking that she had started teaching mathematics to her young niece.

She was not teaching because she had memorised more facts. She was teaching because she had learned to trust her own mind.

The Design Challenge

Learning by Inventing is not simply a matter of attitude or intention. It requires careful design.

You cannot just present a problem and hope the learner discovers the right thing. The sequence matters enormously. Each problem must be small enough to be approachable but designed to point attention in exactly the right direction. The gap between steps must be wide enough to require genuine thought but narrow enough to be crossable.

This is hard work. Designing a good discovery sequence for, say, gradient descent in machine learning takes more thought than writing a clear lecture. You have to deeply understand the concept yourself — not just its final form, but how a mind might move toward it.

I think of it as the difference between building a path and simply standing at the destination. Any teacher can stand at the destination and describe what they see. Far fewer can build a path that leads someone there while feeling like they are exploring freely.

Machine Learning Through Discovery

When I teach machine learning, I do not begin with neural networks or backpropagation.

I begin with a simple prediction problem. I ask learners to guess, by hand, some numbers that are unknown but clearly related to numbers they can see. I ask them to try to improve their guesses. I ask them what they would change if they were wrong.

Through this process, learners begin to invent the core ideas of machine learning — not because I have told them what to invent, but because those ideas are the natural response to the problems I have set. They discover that iterating toward a better answer is more powerful than trying to get it right the first time. They discover that measuring error is the key to improvement.

When we eventually name these ideas — loss function, gradient, parameter update — the names feel like labels for something the learner already understands, rather than terms they must memorise.

What This Means for the AI Teacher

I have spent some time thinking about what an AI teacher based on this philosophy would look like.

Not an AI that answers questions — there are plenty of those. An AI that asks the right question at the right moment. One that can assess where a learner is and design the next small step accordingly. One that knows when to be silent and let the learner struggle, and when to offer a carefully chosen nudge.

This is a harder problem than building a question-answering system. But I believe it is one of the most important things we could build with AI.

Because the goal is not a learner who knows more. The goal is a learner who believes in their own capacity to discover.

If you are an educator, a learner or a builder interested in this philosophy, I would be glad to exchange ideas. You can find more of my thinking on education at the Learning by Inventing section of this site.