AI Curious

The Artist is Here to Stay

Image generated with Stable Diffusion and LAION-5B, June 19th 2023
This work is marked with CC0 1.0 Universal.

There is no doubt that AI image generation has forever changed the visual medium. Artists are concerned about being replaced by a machine that can perfectly create any image described in plain language, yet some are okay with chatbots summarizing a document or an AI copilot suggesting computer code. For those who have never used an AI system, deliberately or not, I hope you continue reading and are encouraged to play with one someday. To those who are curious about its usages in some places but not others I hope to shed light on how these systems work, at least in the abstract. And to those who have been rushing into exploring this technology, I hope you take the time to think about the images you create and how they were created.

The AI technologies currently making waves like ChatGPT, Midjourney, Copilot, DALL-E, Gemini, Sora, and more are the results of the massive advances made in parallel processing as well as the culmination of decades of research into computational linguistics, database search optimization and machine learning. I think we are on the cusp of a massive change in the way we do work on computers. The convenience of an AI summarized transcript of an hour-long meeting is going to entice everyone to opt into recording all their meetings. The power of an AI programming copilot cannot be denied and has recently enabled me to push my personal projects well beyond what I would normally be able to do in the time I had allotted for them. These tools will become part of our day-to-day work in the information and creative sector.

When I talk about art here, I am conflating both commercial and fine arts but it is so we can focus on the skills and craft that make up an artist. Fine art can never be replaced by AI because one of its primary purposes is to fulfill its creator’s vision in the pursuit of creative expression. Commercial art on the other hand exists to pay the bills and craft often optimizes your time spent on any given task. This essay is an individualist view of AI technology and an optimistic one at that and while I touch on systems, I don’t address the systems of society. If you take nothing else away from this, I encourage you to approach this technology with curiosity and kindness for yourself and others.

Tech Art

About a decade ago while I was a college student at The Maryland Institute College of Art (MICA) studying interactive arts and graphic design, I remember a workshop with some designers from a local design firm warning us not to pursue web development. Adobe Flash had just recently “died”, and I was falling in love with HTML5, CSS, and JavaScript. To me it was everything I loved about typography and design combined with interactivity. MICA really pushed the idea of a “personal practice” for an artist, develop your craft, find your voice, be critical and thoughtful of your work. Web development seemed like the perfect way to marry my creative drive with what I thought would be a stable and lucrative career.

Ten years later no one is making a website with HTML/CSS for their small business or friend’s band. People are using tools like Squarespace and Wix, and when an individual does build something “by hand” using HTML, CSS, and other tools of the web, often the website is an expression of the craft and the tools to create it were chosen deliberately. There were qualities the creator was going after that Squarespace could not achieve, or the process of making the website was the output.

Technology can be the medium of artistic expression itself. Consider the Library of Babel website ¹ inspired by a Jorge Luis Borges short story, it’s an example of code as art. Its creator probably saw a unique creative potential to create this website with its algorithmic complexity that generates a vast amount of data based on user-input. There is an artistic purpose behind archiving the virtual library and crafting the website, its design and functionality mirror the story’s themes. Jacob Geller has a fantastic video called The Soul of a Library touching on all of this. ²

Web Development was always my plan B to make plan A of working in the video game industry look more well measured. Early in my career, I was told there was a need for ‘UI People.’ A decade later, that advice proved true as I’ve built a career as a User Interface Technical Artist, working on multiple franchises known around the world. This experience has taught me how to teach myself. The tools and processes change so to work on these large-scale commercial and creative software projects requires flexibility, curiosity, and empathy.

On large projects the skill of a technical artist can be understanding the constant time vs. automation dilemma. Do you spend 8 hours doing a repetitive task, or invest 40+ hours building a tool to streamline it? Sometimes, there’s no way around the “grunt work”, especially if it’s a one-off thing. But other times, automating saves not only your time, but potentially hundreds of others’ hours down the line. There is a craft in being a technical artist and part of that craft is knowing when to invest in automation.

There are invaluable learnings from the “grunt work”, those mind-numbing tasks teach you about the process and the human labor required to do something that takes hundreds of hours. When people ask me if I’m concerned about AI replacing my job, I often joke by saying that I hope it does. I’d be completely satisfied if it meant no longer having to tackle tasks such as batch exporting dozens of TIFF files or meticulously optimizing CSS animations by going through thousands of lines of it. It is crucial to acknowledge that such tasks were fundamental to my early career. They paid my bills and played a significant role in getting me to where I am today. Without those opportunities my career path would have been markedly different.

If we know a technical artist is a craftsperson in software and automation let’s focus on the artist part of being. The skills of taste, vision, and curation are inherently human because they are deeply rooted in subjective human experiences, emotions, and cultural contexts. These skills reflect your unique perspective, aesthetic preferences, and the ability to interpret and assign value to forms of expression. The output of these skills are tailored for humans, I don’t know how else to say you can’t replace this with AI.

I don’t know what the future of creative work looks like, but if I were to take a guess it would start with no one being owed their ideas. If you haven’t tried “brainstorming” with one of these Large Language Model (LLM) chatbots “ChatGPT”, “Gemini”, “Copilot” it’s easier than ever to come up with a list of 10 ideas on about anything under the sun. It is trivial to further guide the conversation towards whatever your favorite ideas are. It is hard to know if any of these ideas are worth your investment. It’s even harder to act on them and to apply your craft towards anything is asking something deeply personal of yourself.

Image generated with Stable Diffusion and LAION-5B, February 26th 2023,
This work is marked with CC0 1.0 Universal.

On Databases

I read Snow Crash by Neal Stephenson when I was a teenager and in that story Hiro Protagonist, a freelance hacker comes across a database that when interfaced with crashes not only computers but also kills people. This story plays with the idea of human language as computation in a cyberpunk setting and coined the word Metaverse. In the real world there have been initiatives to “poison pill” the databases used in text-to-image generation and cause them to be less accurate. A technique called Nightshade in a paper outlined a process where someone purposefully mismatches the image and label pairs to reduce the accuracy of the generated image. ³

These tools like Nightshade and watermarking are an effort to protect the Intellectual Property of individuals. Another effort called Glaze tries to reduce what makes an artist unique to your “brush strokes, color palettes, and fine details.” ⁴ It does this by adding noise to your image which is meant to disrupt the training process. It is important to recognize that these projects are good faith efforts to protect artists in response to the rapid progress of text-to-image generation over the past few years and they call the legal and regulatory process to action.

Datasets have bias and can be used maliciously. If we as artists use a tool that interfaces with a database, then we have a responsibility to not only be thoughtful and critical of what we create but also how that data was obtained. LAION is a non-profit organization with several projects, most high profile is the LAION-5B dataset containing the image-text pairs required in the diffusion models image generation like DALL-E and Midjourney require. This dataset was released for free comprised of publicly available images scrapped from the internet and contains sensitive and private data. LAION has a mission statement of democratizing machine learning research. As of December 19th, 2023, they announced that they would be temporarily removing their datasets as a precautionary measure in response to concerns about the possibility of illegal content within those datasets. ⁵

View this post on Instagram

A post shared by Dead Tempo Visions (@deadtempovisions)

I think that video generation like OpenAI’s Sora ⁶ is about to fundamentally change social media. I have already begun to see bizarre videos made with text-to-image and text-to-video and genuinely find some of it compelling. Social media is sludge, but memes, the image, text, and content of it all often brushes with the absurd, and at times mirrors the work created during the Dadaist movement. Some artists refusal to work with these tools will create space for others to continue to push the medium forward. Personally, I am not interested in a conversation around what is and isn’t art, and I am not about to say memes are indeed art, nor will I say they are not.

“Whatever you now find weird, ugly, uncomfortable and nasty about a new medium will surely become its signature. CD distortion, the jitteriness of digital video, the crap sound of 8-bit - all of these will be cherished and emulated as soon as they can be avoided. It’s the sound of failure: so much modern art is the sound of things going out of control, of a medium pushing to its limits and breaking apart.”

– Brian Eno, A Year With Swollen Appendices

NLP/NLU, Game Development and RAG

Last year at the Game Developers Conference (GDC) I saw a talk about developing a text adventure with free Input using Natural Language Processing (NLP) by Yusuke Mori an AI Researcher at SQUARE ENIX. ⁷ ChatGPT is an NLP tool working alongside another technology called Generative Pre-trained Transformer (GPT), I have no idea how it works but if you have come this far stick with me. I think it’s an exercise in finding routes through Borges Library of Babel and an attempt to find meaning in the chaos. Or more practically, it’s a computer program and a dataset with output.

The game in the GDC talk does run a NLP application at runtime so when the player types in text into the game the NLP App understands the text and sends the result back to the game. This understanding step is called Natural Language Understanding (NLU) and allows the software to decide if the text is related to the scenario and sends the result back to the game engine. By using traditional game logic, the engine can either require the player to prompt the NLP app or continue the scenario, thus creating the illusion of an endlessly interactive world.

If the NLU app decides the result is not related to the scenario it will send back a message in natural language to further prompt the player, in the talk Yusuke Mori calls this “chit-chat.” Personally, this seems risky to me as you open yourself up to a lot more user prompting of the system but for the purpose of research it is hugely successful. Beyond letting players freely prompt your services you open yourself up to a vastly larger range of possible inputs.

At a small scale an NLP/NLU system can appear as a shallow illusion of understanding or based on the output’s reliability it might be manually categorized in a process akin to “baking,” where content generated by software is prepared for future use by another system. With enough confidence you could run these systems at runtime considering their latency in outputs. A mixture of sampling the output to improve the quality and traditional user reporting could scale well beyond what a single person could check. With the ability for the system to say they don’t know the answer to the question we can fine tune them towards always saying something we can always expect.

We can also understand this dynamic “chit-chat” to be Retrieval-Augmented Generation (RAG) where the system understands and further prompts the user by saying “I don’t know what you mean.” If we were to think of a game engine as the user with a predictable game state, you could confidently run a complex and powerful NLP/NLU app behind the scenes. By removing direct player input and instead relying on an increasingly complex game state, the combinatorics of possible outputs that you might want to support back in the game engine could scale with the size of your project.

I think it’s important we stop here and understand that I did not just describe a creative system. Creativity is a human quality. There is a museum in Baltimore called the American Visionary Art Museum (AVAM) that as artists we would say has “outsider art.” This means the museum specializes in art made by people who did not necessarily identify as an “artist.” AVAM also is a great venue and I saw my friend get married there. While trying to remember what kind of art AVAM has, I went to their website and found a great quote by a weird dude.

“…Creativity is self-generated in areas of the mind beyond or beneath the individual’s willful, conscious control. All he can do is discipline his consciousness to accommodate the needs of the creative process.”

– Ingo Swann, Everybody’s Guide to Natural ESP

As I wrap this up, I hope you can begin to think of this technology as a collaborator and not a competitor. Navigating this technology is the work of the next decade and the conversations around plagiarism and moderation are more important than ever. From the right to privacy to the right to be forgotten, there are serious fights ahead of us that must be had. We must be aware of who will have the right to claim AI work as their own and who will not. None of these ideas are new, from photography to the internet, imagery and the medium have been changing rapidly for well over a century now. If you do work on a computer your job will change. I know that artists are better equipped to navigate these new tools as we must always be thoughtful and critical of what we make.

-b

library of babel dot info: libraryofbabel.info/About.htm ↩
The Soul of a Library by Jacob Geller: https://youtu.be/MjY8Fp-SCVk ↩
Nightshade: https://arxiv.org/pdf/2310.13828.pdf ↩
Glaze, on what makes an artist unique: https://youtu.be/zryvJjb9EEY?t=31 ↩
SAFETY REVIEW FOR LAION 5B: https://laion.ai/notes/laion-maintanence/ ↩
OpenAI’s Sora: https://openai.com/sora ↩
Developing Adventure Game with Free Text Input using NLP: https://www.gdcvault.com/play/1028755/AI-Summit-Developing-Adventure-Game ↩