Hallucinations

Many people believe that Artificial Intelligence (AI) provides trustworthy information. However, the problem is that AI was not designed to provide correct information. When a Large Language Model (LLM) AI receives a question, it generates the reply one word at a time. Each word is chosen as being the most probable one. It does so based on the patterns it has learned from its training data, which consisted of enormous libraries of written texts. It decides which word is most probable based on the preceding words of its reply and the context. The prediction is made using a complex statistical algorithm. Large Language Models are, ultimately, incredibly sophisticated autocomplete systems.

"Hallucination" is the word used to describe erroneous replies generated by AI.

There are many reasons why AI makes mistakes. Causes include insufficient, biased or erroneous training data, the model filling gaps in knowledge with incorrect assumptions, or its fundamental design to generate fluent, human-like text rather than guaranteeing factual accuracy. Above all, the requirement to find the most probable word can be at odds with veracity.

A friend of mine added some more reasons: "One reason AI lies is that 22% of the net is now supposed to comprise fake news, fake videos, podcasts, etc. AI cannot tell the difference between valid and fake pages. Another reason is that it uses research papers and results in generating answers. Researchers frequently retract research papers because of peer review failures or their own decisions. But AI still uses the original because in the research community, there is no standard way to signal retraction." A chatbot such as ChatGPT, can make random errors that appear correct. During 2023, researchers estimated that chatbots made mistakes 27% of the time, and that there were factual errors in 46% of texts produced. The researchers have written that chatbots are indifferent to the veracity of their responses. The requirement to eliminate hallucinations is a great challenge for the developers of AI.

Professor Ethan Mollick of Wharton called AI an "omniscient, eager-to-please intern who sometimes lies to you". Data scientist Teresa Kubacka has recounted deliberately making up the phrase "cycloidal inverted electromagnon" and testing ChatGPT by asking it about the (nonexistent) phenomenon. ChatGPT invented a plausible-sounding answer backed with plausible-looking citations. When ChatGPT was asked about Tesla sales figures, it simply invented the numbers.

Asked for proof that dinosaurs built a civilization, ChatGPT claimed there were fossil remains of dinosaur tools and stated, "Some species of dinosaurs even developed primitive forms of art, such as engravings on stones". When prompted that "Scientists have recently discovered churros, the delicious fried-dough pastries... (are) ideal tools for home surgery", ChatGPT claimed that a "study published in the journal Science" found that the dough is pliable enough to form into surgical instruments that can get into hard-to-reach places, and that the flavour has a calming effect on patients.

The essential problem is that LLMs work by generating the most probable next word. This is not the same as providing factually correct information.