The LLM Who Knew Too Little

But Thought It Knew Too Much

Aug 25, 2024

I did not jump aboard the hype train when OpenAI’s GPT 3.0 rolled out for public consumption in around 2021 (or sometime thereabouts; accounting for post-COVID time dilation). I remember being impressed but feeling like it’d be a toy you’d get bored of after you’d try to get it to respond to emails in the style of a 1930s hard-boiled detective.

Things have come a long way since then. Or have they…?

It’s 2024 and AI has since tucked the blockchain, NFTs and web3 into bed and decided it’s time to show the world how hype is really done. CEOs and workers alike envision a near future where all of their jobs are automated away; pundits on LinkedIn reinvent the theory of relativity; and the VC industry has found a new way to the moon in the form of a burgeoning industry of AI-backed tech startups. Let’s Fucking Go! 🚀

Every now and then my lizard brain tingles my spinal chord and the words “be careful what you wish for” briefly take form, soon to fizzle away in the noise of my engagement-addled subconscious.

OpenAI’s GPT-4 greeted the world in early 2023, quite some time since 3.0 and 3.5. There’s no denying that it and it’s 4o successor are powerful, and the same applies to competing models out on the market: not only can they generate text, but their capability to reproduce audio and video is vastly improving.

Now I’ve set the scene, let’s talk about one application that fellow programmers, software engineers, and other stereotypical internet lurkers can really associate with: writing code—Or not writing code, as it happens—but not before a brief history lesson.

Back when I started programming, we didn’t even have Javascript! Well, we did, but nobody dared to write it by default.

Programming, in many ways, is a lot like cooking: you just have to read the fucking recipe. Unlike cooking, the recipe is not forthcoming. Some of it you’ll find in documentation, like man pages and readmes, and other steps you’ll find only by reading the source code or by purchasing reference manuals. The increasing adoption of the internet added online support and Q&A to the mix, typically via usenet, IRC, or a forum.

Some well-to-do epicureans understood the frustration of browsing an internet full of unsolved problems and replies like “never mind, fixed it!” and created Stack Overflow as a solution where every question had to have an accepted answer that other people could then refer to themselves, particularly benefiting less experienced developers and those encountering strange edge-cases.

Stack Overflow was the go-to resource for programmers in need of help for well over a decade, coming close to almost twenty years now, and in the early days it was a great way to build up your programming chops by attempting to answer the questions or solve the problems being asked.

Paired with Google for search, this was like camembert with a fine Bordeaux.

As your experience grows, you tend to spend less time searching for help on the internet because you’ve internalised a lot of the knowledge you previously lacked. You become more comfortable reading the source because you have a more intimate understanding of the language you work with every day, and you can navigate the language and library docs intuitively.

Where is this even going? Is an LLM writing it? Am I an LLM?

If you haven’t guessed already, I am an author with a fairly old-skool approach, very easily dated back to the early/mid 2000s. It doesn’t come at first instinct for me to defer to an LLM for stuff that I felt like I knew intuitively, but…

That’s the exact same mentality that someone older than me would say about Google and Stack Overflow and IDEs and what have you. Except for vim and emacs: totally timeless.

It’s about time we get with the times, don’tcha think? But first, some admissions:

Any bit of research I did here I did through an LLM using Perplexity
I did not even double check the result of my query, I just accepted it as given
You didn’t really need to read any of the post so far but I appreciate if you did

A tingle in my spine again... “be careful”…“wish for”…

Knowing that, I’ve been using LLM-based search by default for quite some time I’ve encountered some situations that, depending on which side of the fence you sit, are either fantastic or baffling.

Perplexity with Claude 3.5, for example, will invent non-existent code on the fly without any reference to source material and then complain if you ask for references.

Prompted to provide more clarity, it admits that it “made a mistake in presenting that information as if it were factual and current.”

On the one hand, this is exactly what you’d expect from a fellow programmer who was still learning the ropes but felt under pressure to give an answer instead of “I don’t know.” I was that person once, I believe many if not most will be similar given the history of office politics and management.

On the other, this is a search tool with an incredible capability to use natural language as an interface: so to what end does it need to fabricate results when it doesn’t have an answer?

This is not the only example, of course, and with code it’s easy: if it’s made up then it won’t compile. But what else will it fabricate in its eagerness to present a confident answer that is then taken as gospel by the user?

It is trained on the sum of all knowledge on the internet, and beyond, and behaves as if it comprehends most of it.

Realistically, it is at a level of unconscious incompetence. The LLM that knew too little but thought it knew too much.

Wrap it up already! You don’t want them to think you’re rambling.

Ultimately, LLM can confidently give you a pudding but the jury is out on whether it can actually be proven. Stick to the taste test before you commit.

Purrbo Mode

Discussion about this post