Herbert Simon's Poverty of Attention

Stephen Burgess | 2023-05-27 |

| Tech Data LLMs

Reading Time: 9 minutes | Reading Level: Grade 12

Herbert Simon's Poverty of Attention

What information to select and what information to discard represents a huge challenge in working with data at scale.

Decision Fatigue

The choice of what to focus on is a crucial one. Where do you put your attention? More than in any time in history, we are faced with a wealth of information. But contrary to the connotations of the word "wealth," this leads to a real problem. When you have nearly unlimited options, what do you choose to pay attention to? What ideas or principles do you follow when making those decisions?

With more access to information than ever, it can be extremely easy to be overwhelmed with details and miss the forest for the trees. It's easier than ever to lose the thread of the real narrative in favor of some inconsequential details. This information overload, when sustained for a long time, can and will lead to decision fatigue. Decision fatigue is real and it's worth trying to understand it and avoid it.

By protecting the resource of your attention, you're giving yourself the chance to be level-headed. If you've ever struggled with procrastination, impulsivity, avoidance, or indecision, it could absolutely be related to decision fatigue.

Information Processing Systems vs Moronic Robots

A poverty of attention

A painting of Herbert Simon. It depicts a middle-aged or older white man with his arms crossed, leaning on a table, wearing a red shirt against a red background.

A portrait of Herbert Simon

A wealth of information creates a poverty of attention.

In this quote, from a 1971 speech at John Hopkins University called Designing Organizations for an Information-Rich World, Herbert Simon succinctly identifies the problem. After presenting the problem of information overload in a way that's clearer than I've ever heard it expressed anywhere, he goes on to propose a solution for the problem. He coins a design pattern called an "information processing system" (IPS). This hypothetical solution would outsource the job of condensing a sea of information to an algorithm. In 2023, over 50 years later, we are just now gaining access to technical which even approaches the ability to do this job. Can large-language models like ChatGPT, with the power to analyze and summarize text, qualify to be an IPS as Simon imagined it?

Teaching crabs to walk straight

Today's computers are moronic robots, and they will continue to be so as long as programming remains in its present primitive state... Computers must be taught to behave at a higher level of intelligence. - Simon, 1971

An IPS, in Simon's formulation, would take large amounts of information and filter it down to the most critical elements. I have to admit, this sounds like an extremely useful tool. After "a large vigorous research and development effort," Simon imagines this system would condense information for us in an unsupervised fashion.

If the IPS is to be even partly automated, we must provide precise descriptions (in the language of the scientific culture) of the processes denoted by vague terms like "analyze" and "summarize." Even if we do not intend to automate the process, the new information-processing technology still will permit us to formulate the programs of human analysts and summarizers with precision so that we can predict reliably the relation between inputs and outputs.

Consider that trying to teach a computer to have an intuition is like trying to teach a crab to walk straight. The desired behavior is not one for which the technology is well-suited.

An image with a white background against which 9 illustrated silhouettes of monkeys on stools in front of tables with typewriters on them.

The way to develop an IPS, as Simon delivered in broad strokes like "providing a precise descriptions of... vague terms like 'analyze' and 'summarize'" strikes me as a charming example of 50-year-old naïveté about computer systems. Obviously, just providing precise definitions of analysis and summarization is not by itself sufficient to create an intelligent computer. Yet, the unrealistic (it's been 50 years) expectations he was putting on algorithms is not that different from the way people expect LLMs to transform economies in 2023.

ChatGPT and Wikipedia

Usually mostly correct most of the time

How much would you say that you trust information you learn from Wikipedia? According to their own clear statement, Wikipedia is not a reliable source.

We do not expect you to trust us. - Wikipedia

ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. - OpenAI (Creators of ChatGPT)

Likewise, OpenAI makes no attempt to disguise the fact that ChatGPT experiences hallucinations. In addition to hallucinations, the related phenomenon of "confident wrongness" is a good reason to keep some healthy skepticism about any and all responses LLMs produce.

Not only can ChatGPT be wrong, it will refuse to admit it was wrong when presented with evidence. Maybe this is a solvable bug, maybe it's not. Based on just how many times I personally have experienced confident wrongness with ChatGPT 3, I think that most likely this issues is inherent to LLMs, at least to the current level of development. I'm now accepting proof that it's possible to completely eliminate confident wrongness from LLM responses. Feel free to comment below.

Regardless, the question is not whether ChatGPT can provide a summary or an analysis. The question is whether ChatGPT can reliably provide an accurate summary. Maybe GPT 5 will completely resolve this issue, but I wouldn't hold my breath. All this doesn't mean that LLMs aren't useful. It just means you should trust them as much as you should trust Wikipedia, which is to say you shouldn't

The ability of LLMs to accurately analyze or summarize is frankly overrated. I've seen so many tells in ChatGPT responses that give it away, and even the tools to identify ChatGPT text have an accuracy rate that's pretty comparable to ChatGPT's accuracy rate itself. Sometimes the responses are high-quality, but sometimes they're gibberish. Who wants an advisor that's wrong 10% of the time? That's only useful if you accept that you can't trust it. Is that really "intelligence?"

The phenomena of hallucinations and confident wrongness are probably built in to the technology. You are guaranteed to get some wrong answers at least some of the time. Is it desirable, or even feasible, to delegate our power of judgement to machines? The more I think about it, the less appealing the idea sounds to me.

Condensing information

I think Simon correctly identified the real need people face, which he calls "condensing information." We need to find ways to condense information, because there's so much of it. However teaching computers to operate at higher levels of intelligence to take over the task of passing judgement is by no means the only solution or even the correct solution to that problem. Just because computers have providing us access to data doesn't mean that computer systems alone will be able to solve the problem they contributed to.

Web traffic sources example

To illustrate the inadequecy of an IPS to operate without supervision, let's take as an example assigning a source to web traffic. A web traffic source can be defined as the origin through which people found your site, typically a link on another page or in an email.

If a small clothing retail site creates an Instagram ad and someone clicks on it, then the Instagram ad is the source of the traffic. Let's say someone clicks on the ad but doesn't buy anything and instead leaves the site. Then the next day, the same person clicks on a Facebook ad for the store, and this time they do buy something. They leave again, and a month later, they click on a link in a sponsored blog post, and this time they buy twice as much.

So, how can we answer the question, "what is the source of this traffic?" We'll have a different answer for this question whether we use first-touch attribution, last-touch-attribution, or multi-touch attribution.

There's no way to create an IPS such that you could give it all your web traffic data and ask it "what is the source of this traffic" without first programming it with the context of the different sourcing strategies, your unique business conditions, and what your desired outcome is. At that point, how much is this technology reducing the cognitive load we have to manage in relation to the cost of supervising it? When serving as an information condenser, at what point does the cost of programming an IPS to your unique needs outweigh the value you get from the cognitive load it takes off your plate?

Simon had an expectation on technology in 1971, that it should be possible to teach an algorithm what's meant by the idea of summarization and analysis. Today, in 2023, it's possible to provide a piece of text to ChatGPT, ask for analysis and summary, and to get an answer. Yet, different people have different needs from analysis or summarization. An online clothing store with thousands of daily transactions will probably think about web traffic sources differently than an independent small business owner.

Yes, you could ask ChatGPT to summarize your web traffic and come up with a source for any given web traffic. First, you'd need to clean and organize your traffic information, provide a list of web traffic strategies, and describe the unique needs of your business. After you do that, you could probably get a decent answer as to the question of what is the source of some given web traffic. Yet, after organizing your question, you yourself would probably have a pretty good idea what the answer was. What was the crucial element to the process of organizing your question and your data? A human being with judgement.

The answer is always "people"

Like a lot of people, I love stories about robots with souls. But large-language models are not computers with souls. They don't have subjective experiences and they're demonstrably bad at things like contextualization and expressing accurate confidence levels. There is no technology that exists in 2023 that can be said to make good judgement calls based on context. The reason that's the case is probably because of how technology like LLMs work under the hood. Even as this powerful technology has "disrupted" various industries, the problem of information overload has simultaneously become worse. It's time to take the expectations of being able to pass the act of judgement off to machines and give that agency back to real people, where it belongs.

We already have all the tools we need to condense the wealth of information we now have access to. Human beings have a much greater capacity and skill at contextualization compared to machine systems. Don't delegate judgement to machines. They're bad at it. Give computers the jobs they're good at and rely on people to do the things people are good at. Leveraging human judgement is the best way to protect one of the most valuable resources we have: our time.

Definitions

ChatGPT

Definition: ChatGPT is an implementation of large-language models specifically designed for conversational interactions. It is trained on a wide range of internet text to understand and respond to user queries in a conversational manner.

Source: OpenAI's blog post introducing ChatGPT. [Link: https://openai.com/blog/chatgpt]

Large-Language Models

Definition: Large-language models are advanced artificial intelligence systems designed to understand and generate human-like text based on vast amounts of training data. These models utilize deep learning techniques, such as transformers, to process and generate coherent and contextually relevant language.

Source: "Language Models are Few-Shot Learners" by Tom B. Brown et al., 2020. [Link: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf]

Hallucations of information

Hallucinations in LLMs refer to instances where the model generates information that may not be factual or accurate. While efforts are made to train models on high-quality data, there is still a risk of hallucinations occurring due to the vastness and diversity of information available on the internet. The hallucination frequency can vary depending on the specific model architecture, training data, and fine-tuning processes.