Member-only story

LLM Token Economy & Optimization

Prompt engineering’s hidden costs

--

Figure 1: Beam Search, Kantoria

Democratizing LLM and foundation models is probably one of the greatest achievements in the last few years. It means that for many use cases, LLM token economy & optimization will be imperative for the success of prompt engineering, similar to how SQL optimization was imperative for optimizing query run time and size.

Please follow me on Substack, where I mostly write about data, managing data teams and everything in between.

We are at an age where we experience a lot of exploration and experimentation with prompts, but we don’t pay attention and talk about the future cost of prompt engineering in production. One aspect of cost for advanced LLMs, such as GPT-4 is that the pricing is based on how many tokens you have in your input and output. OpenAI calls input tokens ‘Prompt tokens’ and output tokens ‘Completion tokens’.

Prompt tokens cost half of Completion tokens, as seen in Figure 2. This actually makes sense as they want you to experiment more and have a cheaper start, but as your business requirements become more complex, the nondeterministic, and highly unknown length and cost of Completion tokens will surprise you at the end of the month billing cycle.

Figure2: GPT-4 Pricing by HuggingFace

It’s worth noting that tokens are not full words, but are parts of words, e.g., ‘sleeping’ is tokenized into ‘sleep’ and ‘##ing’. Packages such as Tiktoken are responsible for tokenization, and each algorithm has its properties. What it means is that there is a multiplying factor when calculating cost, in English, but in non-latin languages, such as symbol-based languages, tokenization could possibly be per character tokenization. The bottom line is that this is an unknown factor in cost.

Another aspect of cost is the popular beam search decoder algorithm generates a tree of next words, in order to choose the best sentence candidate, based on the sum of probabilities, as seen in Figure 1; finally selecting the yellow path from the entire tree. This method is very popular and wastes ±70% of the…

--

--

No responses yet

Write a response