Column

Welcome to the token economy

Enrique Dans

Summary: As Large Language Models (LLMs) evolve, we’re beginning to understand the key factors that will shape leadership in just about every industry in the coming years. We’re witnessing a paradigm shift from the initial focus on amassing vast, well-curated training datasets to a new perspective that prioritizes efficient data conversion through tokenization. We’re moving from ...

As Large Language Models (LLMs) evolve, we’re beginning to understand the key factors that will shape leadership in just about every industry in the coming years. We’re witnessing a paradigm shift from the initial focus on amassing vast, well-curated training datasets to a new perspective that prioritizes efficient data conversion through tokenization.

We’re moving from an approach based on massive amounts of well-selected data for training (where open repositories such as LAION for graphics, or Common Crawl played a key role, followed by a jumble of agreements with all kinds of content providers), to one where the important thing is not simply the amount of data, but the ways to convert it into usable material, for which we need an additional process: tokenization.

The effectiveness of this process will vary, based on chosen criteria, and is currently skewed by the predominance of English content (46% in the case of Common Crawl) in major training repositories. This linguistic bias has far-reaching implications: prompts in non-English languages, while processable, logically tend to consume more tokens.

Welcome to the token economy, where, in the immediate future, organizational success across industries will hinge on how efficient we are at sending prompts to LLMs. Beyond individual queries, businesses will pay for tokens used in…

see more info goto https://medium.com/enrique-dans/welcome-to-the-token-economy-76da709cea9e