More Thoughts on Wikipedia and AI

Reflections inspired by yesterday's event with Wiki Education.

Sep 16, 2023

Dr. Jojo Karlin’s illustrated notes during yesterday’s conference. Instagram: @jojokarlin

Yesterday I had the honor of speaking on the panel “Wikipedia in a Generative AI World” hosted by Wiki Education, a nonprofit that helps bridge the gap between academia and Wikipedia.

I enjoyed the questions posed during the event—both the “official” ones and those spontaneously arising from participants in the live chat. I wanted to share my thoughts with my newsletter audience while they were still fresh.

What do you find most interesting about the intersection of AI and Wikipedia?

The circularity. For twenty plus years, Wikipedia editors have been compiling and curating information based on reliable sources, such as newspaper articles. Now we have ChatGPT and other AI tools that have been trained extensively on Wikipedia’s freely available content. In recent months, the Wikipedia editing community has been considering whether to allow content that has been generated by those AI tools to be inserted into Wikipedia. No doubt, journalists will continue to use Wikipedia (including Wikipedia knowledge that embeds AI content) to produce journalistic content. That journalism is curated by editors, added to Wikipedia, and the cycle continues.

To recap, we have: Journalism → Wikipedia → AI → Wikipedia → Journalism.

We need to have the thorny conversation about all the ways in which this circularity is good and bad. But I like to start with the observation that it’s pretty interesting.

How has the Wikipedia community responded to AI tools like ChatGPT?

Many Wikipedia editors are highly skeptical that AI tools should be permitted. There are concerns that editors who use tools like ChatGPT to generate content, and then add that content to the encyclopedia, will introduce errors to Wikipedia. We know that AI is prone to hallucinate and to generate bogus sources. There are also concerns that AI produces content that infringes copyright, and that the output might not be compatible with Wikipedia’s open licensing framework. Based on these concerns, a few Wikipedia editors have called for an outright ban on the use of AI tools.

On the other hand, some editors see opportunities to use AI tools to increase the productivity and enjoyment of Wikipedia editors. ChatGPT can provide a skeletal outline that the human verifies and fleshes out with improvements. An example is the new article “Artwork Title” started by Richard Knipel.

Wikipedia’s proposed new policy on Large Language Models sets up what I refer to as a “take care and declare” framework. The human editor must take personal responsibility for vetting the LLM content and ensuring its accuracy, and the editor must disclose in an article’s public edit history that an LLM was used. It’s worth noting that the proposed policy for LLMs is very similar to how most Wikipedia bots require some human supervision. Leash your bots, your dogs, and now your LLMs.

What metaphors do you use when thinking about or discussing AI?

This question prompted some interesting responses from the panelists as well as the audience.

The TI calculator - A reader of this newsletter said that she tells her high school students that AI is like a TI calculator. It can help you as a tool, but the “test” is designed so that the human operator must have sufficient knowledge to prompt the device. One of the audience members noted that ChatGPT is more likely to generate misinfo than a typical calculator, and it might help to provide this caveat when using this comparison.
The Eager College Research Assistant - Often it seems like there is a student on the other side of the chatbot, cheerfully responding to the request but not thinking too deeply about the question. Then again, audience members expressed concern with assigning traits of human consciousness to AI tools. (I agree, and have even argued that we should not give AI apps such human-sounding names.)
The Hairball - The knowledge base of a typical LLM is like a huge hairball; the LLM may pull strands from Wikipedia, Tumblr, Reddit, and a variety of other sources without distinguishing among them. Unfortunately the AI hairball does not normally credit its sources because it doesn’t always know how it has arrived at its answer.

What would an ideal future of Wikipedia in an AI world look like?

I’m still thinking through this one, but here are a few ideas that spring to mind—

Wikipedia has 323 language editions, and at times, there are huge differences between them. AI and machine translation can potentially help bridge the gap for smaller wikis.
GenZ has grown impatient with the chore of sifting through Google search results. I could foresee Wikipedia integrating some helpful AI technology that helps human editors find quality sources and double checks to ensure that the underlying sources in fact state what the human claims.
I would prefer for content creation and moderation on Wikipedia to remain human-led with AI tools to enhance the productivity and user experience of human editors.

If you’re interested in this, you might check out my previous articles for Slate.

“Should ChatGPT Be Used to Write Wikipedia Articles?” (January 2023)

“Wikipedia Will Survive A.I.” (August 2023)

Thanks again to Dr. Jojo Karlin for these lovely notes. Instagram: @jojokarlin.

Start writing today. Use the button below to create your Substack and connect your publication with The Info Beat

Start a Substack

Monika Sengul-Jones

Sep 19, 2023

This is such a useful summary of the event! Thank you! I was lucky enough to attend and appreciate that you discussed circular referencing, AI being trained on AI. And the eyebrow raising scenarios that generates. In that case, the metaphor I thought of is something like the hairless dog Dante, from the Disney/Pixar film, Coco, chasing its own tail, unaware what it is. Thanks for your great coverage!

Expand full comment

Source Notes

Discussion about this post