By this point, the many defects of AI-based language models have been analyzed to death—their incorrigible dishonesty, their capacity for bias and bigotry, their lack of common sense. GPT-4, the newest and most advanced such model yet, is already being subjected to the same scrutiny, and it still seems to misfire in pretty much all the ways earlier models did. But large language models have another shortcoming that has so far gotten relatively little attention: their shoddy recall. These multibillion-dollar programs, which require several city blocks’ worth of energy to run, may now be able to code websites, plan vacations, and draft company-wide emails in the style of William Faulkner. But they have the memory of a goldfish.
Ask ChatGPT “What color is the sky on a sunny, cloudless day?” and it will formulate a response by inferring a sequence of words that are likely to come next. So it answers, “On a sunny, cloudless day, the color of the sky is typically a deep shade of blue.” If you then reply, “How about on an overcast day?,” it understands that you really mean to ask, in continuation of your prior question, “What color is the sky on an overcast day?” This ability to remember and contextualize inputs is what gives ChatGPT the ability to carry on some semblance of an actual human conversation rather than simply providing one-off answers like a souped-up Magic 8 ball.