Large Language Models: Three Stages of Adoption

Zhangzhang (ZZ) Si

Ph.D.

Co-founder, Distinguished Machine Learning Engineer

The introduction of ChatGPT has brought large language models (LLMs) to the forefront of everyone’s mindshare. But beyond interacting with ChatGPT, how can companies adopt LLMs? Co-founder and Principal Machine Learning Engineer ZZ Si explains.

In general, there are three options companies can use to adopt large language models:

1. Using a pretrained model

The first option is to utilize pretrained language models. For open source models one can self host its inference API. Examples include EleutherAI’s GPT-neo and BigScience’s BLOOM. Otherwise, a paid inference API needs to be used. Examples include OpenAI’s davinci-text-003, a GPT-3 model, and AI21’s Jurassic 1.

One can get started using a pretrained model in minutes. You can try them out on a small set of your data, and get a sense of the accuracy of such models. Paid APIs are usually more accurate. However, the cost can add up quickly. I tested out GPT Index, a brilliant tool to ask free-formed questions about your data. While the result is very impressive, it cost 30 cents to answer one question about an essay with 15,000 words. It can be prohibitively expensive running hundreds of queries on thousands of documents.

Using an open source model changes the cost structure. You now need to pay for compute hours (usually with GPU), which can be much cheaper than paying based on the number of API calls. However, the quality of open source pretrained models lags behind that of paid APIs. There are on-going efforts in the open source community to produce a high quality model using the same RLHF (Reinforcement Learning from Human Feedback) approach that produced ChatGPT. One such example is a joint effort by HumanLoop and Casper AI. Breakthroughs in this direction can have broad enabling effects. We are grateful for these open source communities, and will continue to watch and contribute in this space.

2. Fine tune existing models

Fine-tuning refers to the process of adapting a pre-trained language model to a specific task or domain by training it on a smaller dataset relevant to the task. This allows the model to learn from the specific characteristics and patterns of the task and improve its performance on that specific task.

Fine-tuning has the following benefits:

Increased accuracy: by fine-tuning a pre-trained language model on a task-specific dataset, the model can learn the nuances and characteristics of the task and achieve higher accuracy compared to using the pre-trained model directly.

Reduced cost: compared to training from scratch, fine-tuning from a pre-trained model can be a cost-effective alternative as the pre-trained model provides a good starting point and requires less data and computational resources to fine-tune.

It is common to use an existing training recipe or even AutoML to fine tune the model. Open source libraries like HuggingFace provide ready-to-use training algorithms and drastically reduce the effort required to develop training algorithms.

However, it's important for companies to carefully consider the resources and expertise required to fine-tune a language model, as it can still be a complex and time-consuming process. It requires high quality data and labels, smart design of model training experiments, and resources to carry out a large number of experiments, to arrive at a high quality model. At KUNGFU.AI, we worked closely with our clients’ business and data science teams to fine tune large language models to classify invoice and billing documents, analyze segments in clinical notes, and extract structured information from scanned contracts, and deploy the models to automate tedious data entry and decisioning processes.

3. Create your own customized models

When fine-tuning is not enough for the desired model quality, the next step is to consider customized models. This means designing new model architectures, and new training recipes to make better use of data and knowledge in your domain.

As an example, we helped a medical firm design a multi-modal transformer model, to combine text and time series inputs to enable more accurate forecasting.

This option often requires more specialized expertise in deep learning and language models. Training customized models from scratch on billions of examples can cost hundreds of thousands of dollars. The BLOOM model, for example, is said to cost $1.6 million to develop.

The return on investment can be significant too. A breakthrough in the model's accuracy creates a moat and can attract more customers. ChatGPT and its integration with Bing, has generated so much interest, that Google needed to respond with releasing a new search interface using its own large language model. ChatGPT can potentially increase the market share of Bing, and it is estimated that a percent increase in market share can yield $2 billion of new revenue.