Blogs

The progress of LLMs is in danger. This is why.

By David Carvalhão posted 15-11-2025 16:59

  
To understand why, I need to first introduce the concept of degenerative AI.
 
LLMs are trained with vast amounts of data collected from the Internet, ranging from Wikipedia to an assortment of all types of books, blogs, webpages, photos and videos, from which they derive the patterns that define their internal parameters, allowing the generation of similarly structured content.
 
The problem is, LLM performance only improves when the content used for their training is diverse, that is, "original".
 
Using an analogy that is not precise but stil useful, the same way that when we make a photocopy of another photocopy the quality of the result decays, if the data with which an LLM is trained was itself created by another LLM, the performance of the LLM tends to decrease.
 
To this effect we call "degenerative AI".
 
As LLMs become more pervasive, an increasing amount of new content available on the Internet is produced by LLMs. 
 
In fact, this year we have crossed the threshold in which MOST of the new content is AI generated, to the point that over 74% of all new webpages in English language are now LLM generated.
 
Given the difficulty of distinguishing human from AI content, the result is that newer data sets are contaminated by incresing amounts of AI generated content, which therefore leads to a decay in new LLM training and overall performance due to the degenerative AI effect.
 
There have been some attempts to address this issue. One of the most interesting ones is led by one of the creators of the RSS feed technology, Dave Winer, who proposes the creation of "textcasting", which is an RSS-like subscription process that would allow humans to easily monetize their original content, potentially leading to its proliferation and easy identification as such.
 
But the bottom line is that, ironically, the future of LLM development is, at least at this stage, highly dependent that humans continue to produce novel content, and unless we can reverse the current trend, we may be very well reaching the limits of LLM AI performance.
 
For my part, as this article was not written by AI, it's in itself my small contribution to the solving of this problem.
0 comments
3 views

Permalink