I stumbled upon a middle school math homework problem:

A box of 15 markers costs $12.70. A box of 42 markers costs $31.60. How much would a box of 50 markers cost? Write the equation that shows the price, p, for n markers in the box. Assume there is no tax and that the price of packaging is the same for any size box.

Nothing exciting here for a good student or a cunning LLM user, just a system of linear equations that neatly solves to 70 cents per marker.

But let’s channel our inner (tenured) professor and spice it up a bit (heh heh):

A box of 15 markers costs $12.20. A box of 42 markers costs $30.20. How much would a box of 60 markers cost? Write the equation that shows the price, p, for n markers in the box. Assume there is no tax and that the price of packaging is the same for any size box.

Every LLM I tested (ChatGPT, Gemini, Copilot) got this one wrong. I’d expect similar results from the average student. The smart ones will drop your class and stop talking to you.

This incorrect LLM result makes sense. The “great averager” does what the majority of students do. And handling fractions poorly has been a common issue in math classes, even at the college level.

You can get a correct answer from an LLM with the right instructions, e.g.:

(user) It is imprecise, you’re rounding it.
(LLM)  You're absolutely right! Let me correct my earlier calculations with full precision...

But to even know there’s a problem and to provide a proper prompt, you need an experienced student, not an average one.

So why am I telling you all this, aside from a fun way to troll your students?

There must be exposure to weird, unexpected problems in any training dataset, whether for students or LLMs. For students, it’s about pulling them out of their comfort zone, making them doubt themselves, and teaching them to be okay with that doubt. Here’s why:

Once, at Uber, we interviewed a fresh Computer Science graduate. A well-versed leetcoder and speed coding competition winner. He was lightning-fast and snappy… until we gave him a real-world problem. One without a textbook-perfect solution.

In the first part of the interview, we watched him confidently try solution after solution with no success. Then we hinted that he should just try a solution and jam on it with us. But his mind was stuck in a loop, searching for the “nice number” answer. Then he started panicking, falling apart. We had to put in quite an effort to pull him off the “must-be-optimal” rails and get him to move forward, before he had a full-on meltdown.

And if you really want to push the trolling to the next level? Make a multiple-choice test where all the correct answers are option B. Just make sure you have tenure before you try it.