On Teaching Machines to Read

Note: This was written with the assistance of various AI models, ChatGPT, Gemini and Claude. In a process mostly closely akin to an exquisite corpse.1 I went from outline to drafts taking turns with an AI rewriting portion of this essay. While this is an experimental writing piece it’s also a personal essay and I consider it to be my voice and my words even if augmented by some computer running in the cloud.

Image generated with Stable Diffusion and LAION-5B, February 12th 2023
Image generated with Stable Diffusion and LAION-5B, June 19th 2023
This work is marked with CC0 1.0 Universal.

I have been telling people I can’t read. Despite not being able to read, I read all the time. What I mean is that I don’t identify with a reader. To be alive in 2024 is to read, you need to read the symbols on your screens, messages your friends send you and if you dare, words written online. A couple of years ago I got a Kindle and have read some of The Expanse books as well as revisiting a book I read as a kid called Dragon’s Blood. I listen to podcasts and audiobooks as well as the flirtations with novels on an e-ink screen. I enjoy writing and reading letters, receiving and sending them is a fun physical phenomenon. I’ve read books. We can observe that reading encompasses a multitude of mediums, both physical and virtual, in which you can participate. If you’re reading this right now, congrats—you can read.

I remember not knowing how to read. I went to a Waldorf school in the 2nd grade which had a regime of teaching that resulted in me not learning to read at a rate in which my father was happy with. My dad taught me how to read and I recall how much I disliked the process of sitting still and reading books. Despite this, within a year I went from being forced to read children’s books to reading and loving The Hobbit. My childhood memories of playing in the suburban hills of Los Angeles are intertwined with tales of hobbits and dwarves.

The first edition of The Hobbit was published in 1937 by George Allen & Unwin, likely using plates created by a Linotype machine.2 Like Gutenberg’s printing press, the Linotype machine revolutionized text distribution. Unlike the movable type of the printing press, the Linotype used a keyboard to arrange molds into lines of text, ready for printing. This technology, in use until the 1980s, helped books like The Hobbit reach a wide audience, contributing to their monumental status in literature.

It Reads?

Diagram of AI, Machine Learning, Deep Learning, and Generative AI
Diagram illustrating the relationship between Artificial Intelligence, Machine Learning, Deep Learning, and Generative AI.
Source: Microsoft Generative AI for Beginners

Tokenization is the process of turning every word and symbol into a number. In Python, this is done by encoding a sentence into a representation called a tensor. The tensor is an array of tokens, with each token representing a word or set of characters and symbols. This tensor can be decoded back to its original words and symbols. Think of it like the Linotype machines, where each symbol had its own metal mold. An operator would select and arrange these molds, creating a line of text ready for the press. Instead of casting molten metal, here we’re casting words into numbers. The tensor preserves the order of the words, mirroring the sequence they had in the original sentence.3

sentence_encoded = tokenizer("In a hole in the ground there lived a hobbit.", return_tensors='pt')
sentence_decoded = tokenizer.decode(
        sentence_encoded["input_ids"][0], 
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)
ENCODED SENTENCE:
tensor([  86,    3,    9, 6356,   16,    8, 1591,  132, 4114,    3,    9, 3534,
         115, 2360,    5,    1])

DECODED SENTENCE:
In a hole in the ground there lived a hobbit.

Once information is translated into numerical representations, there is an instinct to unleash powerful computation upon it. In the case of language models like ChatGPT (Generative Pre-trained Transformer), the “PT” signifies pre-trained weights, which define the complex relationships between word tokens mapped within a multidimensional space. This model operates by heuristically predicting the next likely token in a sequence. This process is stochastic rather than purely statistical. In a purely statistical process, the next word would always be the one with the highest probability. However, in a stochastic process, the model uses a ‘temperature setting’ that introduces noise or randomness, sometimes not selecting the most likely next word. This temperature setting controls how deterministic or random the model’s predictions are. A lower temperature makes the model more deterministic, often choosing the highest probability word, while a higher temperature increases randomness, which can lead to more creative or unexpected outputs.

The product innovation of OpenAI’s ChatGPT lies in its presentation as a chat interface, leveraging our ingrained habits of text-based communication. While chatbot programs have existed since the 1960s, beginning with ELIZA4—a program designed to simulate conversation by prompting users to continue speaking—the chat interface of ChatGPT goes much further. It retrains us to use natural language, encouraging a more conversational and contextually rich interaction than traditional search queries. The chat interface paradigm goes further than just the interface as the model was trained to treat the users prompts like questions and modeled the responses like answers. Further advancements in techniques like Retrieval-Augmented Generation (RAG) and processes such as Chain-of-Thought (CoT), when combined, start to create the illusion of highly capable machines.

I used a computer for the first time shortly after learning how to read. I have been called a “digital native” before and considering I made my Instagram account my freshman year of college, I am probably one of the last people to have gone to high school before the rise of ubiquitous mobile social media. Around the same time I was learning to write cursive I was learning how to write a markup language and using it to customize a Myspace profile. In the 2nd grade I was told we weren’t always going to have a calculator in our pocket and by the 6th grade I did. Within my lifetime our relationship with knowledge has changed, my parents would have spent hours in a library navigating card catalogs or reference books to research a topic. Today knowledge, or its simulacrum is instantly available all the time.

Information Synthesis

As AI-generated content becomes increasingly prevalent, I find myself playing a guessing game, trying to discern the human from the machine. Even more intriguing, I’ve begun to employ AI to summarize these very communications, a Russian doll of meaning distillation. In this process, we must acknowledge the risk of a lossy transmission, where each iteration strips away nuance and context. Not to mention the very real costs in electricity and computing power required to run these AI models.

But perhaps the greatest cost is more intangible. As we outsource more of our reading and writing to AI, do we risk the very skills that make us human? Like pilots who must resist the atrophy of automation, we too must actively engage our faculties of comprehension and expression. The neural networks in our brains, after all, operate on a “use it or lose it” basis.

Yet for me, the advent of Large Language Models like ChatGPT has been filled with exploration and play. My writing process has been utterly transformed by these tools. In a 1971 interview, Marshall McLuhan expressed skepticism about clear prose, suggesting that it indicates an absence of thought and reveals contempt for the intellectual processes. Ironically, it is precisely for the sake of “clear prose” that I’ve turned to LLMs. As someone who considers himself a strong speaker but has always struggled with the stillness and solitude of writing, I find in these AI a much-needed writing aid. With the audacity of childlike curiosity, I now write bad poetry with the LLM, unconcerned with plagiarism in this personal, non-institutional, non-commercial context. Unlike the pilot, I am not worried about losing some muscle necessary to my survival—I was never good at this anyway.

“Clear prose indicates an absence of thought, contempt for the intellectual processes. There is a saying of Thoreau, he said, ‘The snow drift is the lull in the wind; the institution is the lull in thought.’ When you can’t think anymore, you establish an institution. When the wind stops blowing, the snow drift forms. But these are not cynical remarks, they’re studies of processes. And the purpose of studying processes is to avoid various kinds of hangups, miseries.”

– Marshall McLuhan, 1971 Interview

For all its transformative potential, AI remains but the latest in a long lineage of communications technologies, stretching back to the advent of language itself. At its core, it is a tool for augmenting, not replacing, the most essential human connection, whether spoken face-to-face or mediated by screens and software. The printing press is thrown around as a comparative technology to AI but perhaps the current chatbot-based approach is more like the Linotype machine, quietly revolutionizing typesetting and becoming ubiquitous without needing the hype. It automated the production of printed materials and made them more accessible, changing the world in a significant yet understated way.

So, while I may joke about my inability to read, the truth is more complex. In a world saturated with text across countless platforms, reading has become an inescapable necessity. The challenge lies in preserving the depth of our engagement, even as the means of consumption evolve. We must learn to read anew, with a keen awareness of the capacities and limitations of these machines that can read and write for us. We must cultivate a discerning eye, knowing when to lean on AI’s efficiency and when to trust our own critical faculties. Only then can we chart a course through the great data deluge and find meaning in the maelstrom.