Skip to main content

Why the long interface? AI systems don’t ‘get’ the joke, research reveals

24 November 2025

A woman with brown hair is photographed peaking above a computer screen.
Previous studies had suggested that LLMs could process puns in a similar way to humans, but the Cardiff-Ca’ Foscari team found otherwise.

Powerful artificial intelligence (AI) systems, like ChatGPT and Gemini, simulate understanding of comedy wordplay, but never really ‘get the joke’, a new study suggests.

Researchers wanted to find out whether large language models (LLMs) can understand puns – also known as paronomasia – wordplay that relies on double meanings or sound-alike words, for an intended humorous or rhetorical effect.

While earlier studies suggest LLMs could process this type of humour in a similar way to humans, the team from Cardiff University and Ca’ Foscari University of Venice found AI systems mostly memorise familiar joke structures rather than actually understand them.

Their methodical analysis, carried out in Cardiff when authors Alessandro Zangari and Matteo Marcuzzo were visiting researchers in 2024-2025, put the models to the test, revealing how well they handle this playful side of language.

The team’s findings, presented at the 2025 Conference on Empirical Methods in Natural Language Processing, show that, despite their apparent intelligence, these models still lack genuine creativity and deep understanding.

Co-author Professor Jose Camacho-Collados from Cardiff University’s School of Computer Science and Informatics, said: “Our research is probably the first to show how fragile LLMs’ humour comprehension really is. In some ways, this was surprising given their ever-increasing capabilities and previous research on the topic.”

Our observations hinted otherwise and, ultimately, we found their understanding of puns is just an illusion. For example, when they see a sentence that looks like a pun, such as “Old X never die, they just X”, they insist it’s funny and that’s especially the case if a sentence looks like a pun but makes no sense or lacks comedic intent or double meaning.

Professor Jose Camacho Collados Professor

Earlier studies suggested AI models ‘got’ humour much like humans do but the datasets used were not really suited to testing how AIs interpret puns, the team argues.

For their analysis, they refined the datasets and created new ones to probe deeper.

The team fed the models puns like “Long fairy tales have a tendency to dragon (drag on)” and swapped the key word, to create nonsense like “Long fairy tales have a tendency to wyvern.”

They found significant drops in accuracy and incorrect classification of puns as well as flawed phonetic and contextual cues for the models’ decision making.

When faced with unfamiliar puns, their success rate in distinguishing puns from sentences without a pun can drop to as low as 20% - much worse than the 50% you’d expect from random guessing.

Mohammad Taher Pilehvar Senior Lecturer

“We also identified an overconfidence in the models’ assumption that what they were processing was in fact funny. This was especially the case when it came to puns that they hadn’t seen before,” explains Mohammad Taher Pilehvar, another of the paper’s authors from Cardiff University’s School of Computer Science and Informatics.

The authors advise caution when using LLMs for applications that extend beyond what LLMs have memorised from existing text, which might require creative thinking such as understanding humour, empathy or cultural nuance.

“It’s a reminder that, in general, outputs from these models should be taken with a pinch of salt,” said Professor Camacho-Collados.

While AI is becoming more powerful, it’s perhaps safe to say from our study at least, that humans will always get the last laugh when it comes to comedy.

Professor Jose Camacho Collados

The team plans to extend their work beyond puns, to other tasks requiring creative and original thinking.

Making AI systems more self-aware, is another of the team’s goals which, they say, could enable the models to recognise what they don’t actually understand.

The paper, ‘Pun Unintended: LLMs and the Illusion of Humor Understanding’, is published in the proceedings of the EMNLP 2025.