Last week, at Google’s annual conference dedicated to new products and technologies, the company announced a change to its premier AI product: The Bard chatbot, like OpenAI’s GPT-4, will soon be able to describe images. Although it may seem like a minor update, the enhancement is part of a quiet revolution in how companies, researchers, and consumers develop and use AI—pushing the technology not only beyond remixing written language and into different media, but toward the loftier goal of a rich and thorough comprehension of the world. ChatGPT is six months old, and it’s already starting to look outdated.
That program and its cousins, known as large language models, mime intelligence by predicting what words are statistically likely to follow one another in a sentence. Researchers have trained these models on ever more text—at this point, every book ever and then some—with the premise that force-feeding machines more words in different configurations will yield better predictions and smarter programs. This text-maximalist approach to AI development has been dominant, especially among the most public-facing corporate products, for years.