LLM Token Economy & Optimization
Prompt engineering’s hidden costs
Democratizing LLM and foundation models is probably one of the greatest achievements in the last few years. It means that for many use cases, LLM token economy & optimization will be imperative for the success of prompt engineering, similar to how SQL optimization was imperative for optimizing query run time and size.
Please follow me on Substack, where I mostly write about data, managing data teams and everything in between.
We are at an age where we experience a lot of exploration and experimentation with prompts, but we don’t pay attention and talk about the future cost of prompt engineering in production. One aspect of cost for advanced LLMs, such as GPT-4 is that the pricing is based on how many tokens you have in your input and output. OpenAI calls input tokens ‘Prompt tokens’ and output tokens ‘Completion tokens’.
Prompt tokens cost half of Completion tokens, as seen in Figure 2. This actually makes sense as they want you to experiment more and have a cheaper start, but as your business requirements become more complex, the nondeterministic, and highly unknown length and cost of Completion tokens will surprise you at the end of the month billing cycle.