LLMs are Engines. It’s Time for Vehicles.

Nathan Mandi

Senior Machine Learning Engineer

Generative AI is upon us, and it’s as though we’ve just invented the engine.

Fundamentally, the engine is a powerful tool that revolutionized our ability to automate useful work. Once a novel luxury, it is now mass-produced. It spans the spectrum from a simple workhorse to a cutting edge machine. Inside the engine, there’s plenty of technology to optimize, but in order to truly leverage its power you need to build a system around the engine. Enter the vehicle.

I believe the large language model (LLM) is the new engine, and we’re just beginning to design vehicles around it. Right now, it is common to write a prompt as input and directly receive the generated output, but so much more is possible if we treat an LLM as a function inside a larger program. As we develop better techniques for synthesizing prompts and for standardizing outputs, we are learning how to use LLM calls as part of a larger system. These systems process large volumes of information, take actions with tool integrations, and perform other abstract tasks.

As this field matures, these systems could be as diverse as cars, trains, planes, ships, rockets. What will be the LLM segway? The LLM jetski? Though it’s hard to know which will prove useful and scalable, I can imagine a world where a large variety of such systems become household names. For example, perhaps in the future you have a favorite AI assistant that helps you cook, another that organizes your calendar, and another that streamlines personal investing. Each is optimized for its niche, provided by a couple top competitors, and at least a little bit personalized to fit your specific preferences.

To build such vehicles, we’ll need tools for handling the flow of information into, within, and out of the engine. Here are three technologies making it possible - one that is well into productionization, and two that are brand new and very promising.

Vehicle 1 - Retrieval Augmented Generation (RAG)

RAG refers to a class of techniques for providing an LLM access to a specific source of information. It enables use of these models on highly specific, structured, and private data, which is crucial for a huge space of applications. For example, RAG takes you from “Respond to the following customer support request.” to “Here is all our documentation and the entire history of support tickets for this product. Using this information, respond to the following customer support request.” The key idea is that instead of simply responding to a user’s question, we can first sift through our data for the most relevant information and place it in the prompt. This primes the model with the right context.

The concept of RAG originated in 2020, and with the explosion of LLM tools it’s now more relevant than ever. It’s a versatile system with many modular components including embedding stores, retrieval functions, prompt chains, and more. Each of these components can be independently tuned, replaced, and improved by research, much like parts of a car. There’s endless possibilities for a composite system, often involving multiple data sources with different roles.

The frontier of RAG is led by LlamaIndex¹, an open source project with thousands of users and new content coming out practically every week, such as new query engines, tool integrations from the community, evaluation methods, and high level concepts such as “LLM operating systems.”²

‍Vehicle 2 - Combining Natural Language and Code

LLMs are a breakthrough in deep semantic reasoning, but notably they often struggle with basic arithmetic and calculations, things that a simple computer program can solve quite easily. Well, what if we had a hybrid system that uses the right tool for each job? Ultimately the idea here again is to think of the “model” as not the LLM itself, but rather a program that uses an LLM as a tool.

The latest in this line of work is Chain of Code³, from DeepMind. They’ve fine tuned a model to respond in Python, using mathematical operators and loops to compute what it can, and abstracting away what it doesn’t know in functions to be carried out by another LLM call. It already addresses one of the biggest limitations of these models, and it’s just the beginning. What I find most exciting is that it feels like a much cleaner use of computation, truly using the right tool for the job. I reckon we’ll be seeing more of this.

‍Vehicle 3 - Test Time Computation

And finally, a peek into the general-purpose next generation systems to come. Broadly speaking, test time computation refers to the idea of using a language model to “reason” about the answer to one hard question in a multi-stage process. One model generates possible thought processes, while another then evaluates those outputs for the correctness of their reasoning steps. The best can be carried forward in a new round - in other words, you spend more time computing to hone in on a better answer⁴.

The result is a tree of steps that alternates between branching of many ideas, then pruning and focusing on the best ones, ultimately reaching more depth and higher quality in a process that truly resembles thinking. Such a model would be enormously expensive to run, and would have the capacity to solve very difficult problems, potentially aiding human research in fields like medicine, biology, physics, and math. This is suspected to be the technology behind OpenAI’s Q*, a next-generation model yet to be confirmed.

—————

While LLMs are not yet generally intelligent at human level, these types of approaches are avenues by which they may reach a new level. All three of these approaches combine multiple LLM calls with other logic in a hybrid program. Each targets a different limitation, and they can and will be combined. No one knows where these paths will take us, but I believe they extend far beyond what we can currently imagine.

When the new “automobiles” began to transform the structure of society, many opposed them. They rightfully claimed these new machines would cause a new kind of danger, and disrupt the balance of jobs in the economy. But while the role of horse-drawn carriages faded away, it was replaced by many more roles for new vehicles. Put another way, these vehicles raised overall productivity, which means more, not less. The new AI engines of today are doing the same, growing the pie, and one of the biggest challenges we face today is how to create an economic structure that encourages this prosperity is shared, not concentrated⁵.

I look forward to seeing what vehicles we create in the coming years. It’s time to strap in and enjoy the ride!