“If everybody always lies to you, the consequence is not that you believe the lies, but rather that nobody believes anything any longer.” These words were spoken by the political scientist and philosopher Hannah Arendt in 1974. She was discussing how totalitarian governments, through their propaganda, might induce such wide cynicism that people lose faith in the truth. Her words resonate with today’s fears about the dangers of rampant disinformation generated by large language models (LLMs) and spread by social media.
Today’s worry is not of a totalitarian government, but a drift into a society where people do not know what is true and lose their trust in each other and in the institutions that were established to preserve stability. I offer here some reflections on trust and truth in the Internet with the aim of developing defenses against this drift.
Trust
In our social relations, trust means we believe a person will act with competence, sincerity, and care. Suppose I trust you. The competence assessment supports my belief you have the skills and resources to do the job you promised. For if I doubt your skills or resources, I will not trust your promise. The sincerity assessment supports my belief you intend to fulfill your promise. For if I doubt your intent, I will not trust your promise. The care assessment supports my belief you will honor our relationship, keeping me informed of any needed changes in our agreements and going out of your way to fulfill them should a challenging breakdown occur.
These three assessments are grounded in the history of our promises to each other. In his book, Seven Habits, Stephen Covey discusses the Emotional Bank Account. Each fulfilled promise or act of kindness is a deposit. Each broken promise or insensitive act is a withdrawal. Betrayals are expensive and damaging: it takes only one or two to kill the trust earned over many deposits. Unfortunately, by reducing trust to the effects of transactions, this metaphor hides the essence of trust, which is that the parties take care of their relationship. If something comes up that would violate their expectations, they talk to modify their expectations to fit the new circumstance. They will go to extraordinary lengths to meet their expectation if a contingency or emergency arises.
Many relationships, such as new ones or interactions via a website, must work at a low-trust level. Low-trust parties have a mutual agreement to follow basic transactional rules with precise protocols for requests, promises, and declarations. Consider a purchase transaction with a merchant. The merchant offers a service at a price (declaration). The customer asks the merchant to perform the service (request) and agrees and make the payment (promise). After completing the service, the customer ends the transaction with an acceptance (another declaration). After several successful transactions, the customer and the merchant develop a mutual confidence that the other will do their part. A higher-trust relationship emerges. Now the merchant lets the customer delay a payment or receive a customized service. A relationship can evolve to high-trust when the competence and sincerity assessments are taken for granted, and the belief that the other will take care becomes unconditional.
Taking It to the Machine
We take this understanding of trust to machines by mapping the assessments into testable metrics. A machine that passes all the tests is called trustworthy. Competence maps to tests that a machine meets all its technical specifications; the machine is competent if it behaves properly on every use and, in case of failure, degrades gracefully. Sincerity maps to tests that the machine’s builders have kept their word on the machine’s functions, have been transparent about the development process, and have a following of satisfied customers. Care is different. We cannot map care to machines because they cannot care at all.
This means we use machines in a low-trust mode, as enactors of transactions, without presuming that the machine cares about any transaction or any customer. Machines can be very useful even if they do not care about us. We take great pains to learn the machine’s “safe operating range”—when it behaves according to expectation and when it is likely to fail or generate errors. Chip-making provides an example. Chips are tested for their error rates as the clock speed is increased. Their specs warn against running the clock too fast. Well-engineered machines are likely to be trustworthy in their operating ranges.
LLMs challenge this standard engineering practice. The problem is “hallucinations”—responses that contain errors or fabrications, often presented as authoritative statements. Hallucinations appear to be inherent in LLMs. LLMs have no means to verify that statements they generate are true. Users are left to determine for themselves, from other information available to them, whether an LLM output is truthful. To date, researchers have been unable to define “safe operating ranges” for LLMs or to find internal constraints that eliminate hallucinations. It seems unlikely that anyone is going to find restrictions on prompts that guarantee hallucination-free responses. It is more likely that, through extensive testing, an LLM can be rated with a hallucination likelihood. Early experiments of this kind show the hallucination rate can sometimes be driven down to approximately 15%, which is too high for critical applications.
How does a human know when an LLM hallucinates? Sometimes it is obvious because the LLM response makes no sense in the context of what the human knows. More often it’s not obvious—for example when LLM makes a claim and gives nonexistent citations. In those cases, the human must consult other sources, for example, a Google search seeking corroborating or refuting evidence. But searching the Web for truth is problematic. Much information is incorrect and has been absorbed into the LLM when it is trained. In addition, Google searches now respond with an “AI summary” which, because it was generated by an LLM, might contain hallucinations.
My colleague Espen Anderson (University of Oslo) has suggested evaluating trust in the context of the kinds of jobs LLMs can do. When would you hire (pay money for) an LLM to do a job? LLMs can do three kinds of jobs: conversation, distillation, and fabrication. These distinctions can help focus when I am willing to trust an LLM. I might trust an LLM to have a companionate conversation with me, but not to provide accurate biographical information about someone. I might trust an LLM to make a useful summary of a book, but not to generate an accurate transcript of my session with a doctor. I might trust an LLM to generate interesting new images, but not to prevent deepfakes.
Truth
Trust issues with LLMs bring us to deeper and more difficult questions. What is truth? Can we test whether a claim is true? How does science do this? Science has evolved rigorous approaches to determining what is true about nature. A good treatment of this dynamic process can be found in the 1987 book Science in Action,a by the philosopher Bruno Latour.
Latour’s main claim is that a hypothesis moves from its birth as a speculation or hunch to maturity as a settled scientific fact by a social process of accumulating allies. An ally is a person who accepts the claim. Over a period of experimentation and debate, the claim gains followers and loses dissenters. Claims that cannot withstand the assaults of doubters are discarded. Science has developed rigorous standards for when experimental results support claims. And, of course, any settled claim is falsifiable—it can be reopened if new, contrary evidence surfaces. In other words, scientific truths are socially constructed as the scientific community tests claims and comes to agreement that they hold.
Some scientists are uncomfortable with Latour’s claim that scientific facts are socially constructed. Science is supposed to be discovering immutable truths about nature. However, we cannot see nature directly. We can only see it through our senses and instruments, and what we see is shaped by our interpretations. To overcome differences of interpretation, scientists debate and test claims until everyone agrees they are true. Even that is not final: what is taken as true can change. For example, in the mid-1800s physicists generally believed light moves in an ether and the measured speed of light depends on the relative motion of the observer and the light source. These assumptions were called into question after 1880 because not even the most sensitive instrument was able to detect an ether. In the early 1900s, Einstein postulated, in his theory of relativity, that light speed is always measured the same by every observer regardless of motion. As experiments confirmed Einstein’s theory, the old belief in ether and variability of measured light speed disappeared.
In practice, then, to establish that something is true, we need to present enough evidence that observers will accept the claim. Only when there is sufficient evidence will they accept. When we apply this to LLMs, the problem is finding independent sources that corroborate the claim, which may be difficult because all evidence visible on the Internet was in the training data.
Artificial Neuron Networks Can Be Trustworthy
The core of an LLM is an artificial neural network (ANN) that outputs a highly probable word that continues a prompt. The LLM feeds each new output word back into the prompt and generates a new output word from the modified prompt. This cyclical feedback mode of using the core, called “recurrent neural network,” can amplify errors in the embedded ANN.
However, ANNs used on their own may be much more reliable. A “uniform approximation theorem” for ANNs says an ANN with sufficient training data and sufficient capacity of nodes and connections can approximate any continuous and bounded function arbitrarily closely. The words continuous and bounded are important. Continuous means that a small change of input produces a small change of output. Bounded means that each parameter and the function itself have specific upper and lower limits. When an ANN is trained on a large sample of input-output pairs collected from observations of a continuous bounded function, the theorem says that there is an error bound E such that the network output for any within-bounds input, whether or not in the training set, is within E of the correct output. The error bound E diminishes with larger training sets and networks.
This is useful in science because many natural processes can be accurately modeled as continuous bounded functions. Many, but not all. A process containing exponential components is unbounded. A process containing chaotic components fails to be continuous in the chaotic regions. Weather forecasting is of this kind. Some weather phenomena are chaotic (turbulence) or unbounded (wind speeds in a tornado). To overcome this, forecasts are composed from the predictions of several numerical models run in parallel. The accuracy is quite good for short-range forecasts (a few days) but decays for longer ranges (weeks). Researchers are finding that ANNs are acceptably accurate for short-range weather predictions but their reliability decays beyond a few days because of chaotic events that are better tolerated with traditional numerical models. These ANNs are much faster than traditional numerical models.
Another problematic use of ANNs arises when the training data are not from a continuous function, such as a map from images to names. In such cases, a small change of input (such as modifying just a few pixels in an image) can produce a large change of output. Even after a lot of training, there is no bound on how much error the network might generate on predicting the name of a new image or a slightly modified version of a trained image. This problem has rightfully been called fragility.
The recurrent network structure of an LLM amplifies the fragility of its core ANN trained on text data that do not conform to a continuous bounded function. Some scientists have said that the hallucination problem for LLMs is so deep that “LLMs always hallucinate. Sometimes they are right.”
Even so, the evidence from science is encouraging for scientific applications of ANNs because ANNs are not fragile when used to predict values of continuous bounded functions; and the are fast approximators for short-range predictions.
Conclusion
LLMs are not trustworthy because they hallucinate. LLM hallucinations are inevitable because of the cyclic feedback structure for the core ANN amplifies the fragility of their core ANNs on discontinuous training data. Detecting hallucinations, and possibly correcting them, is difficult because all the usual independent sources in the Internet are already included in the training data. Detecting hallucinations is further complicated because truth in the Internet tends to be whatever a group says it is. The processes of science—extensive testing and debating of hypotheses aiming to determine their truth– are the most reliable means we have of ferreting out truth. Unfortunately, these processes take time. There may be no reliable way, in real time, to rapidly “look up” the truth via an Internet search for evidence to corroborate an LLM’s claim. Hannah Arendt would be alarmed at watching Internet communities drift into a dysfunction because no one would trust anything communicated to them through the Internet.