OpenAI’s latest GPT-5 model has faced criticism from users who find its tone too sterile and lack creativity in generating responses. To understand the differences between the new model and its predecessor, GPT-4, we put both through a series of test prompts.
Our tests included original dad jokes, which were generated by each model. While ChatGPT’s attempts at humor fell flat due to their unoriginality, GPT-5 produced some good examples of the form that would be suitable for young audiences. However, GPT-4o struggled with puns, producing some confusing and unoriginal jokes.
The challenges faced by GPT-4o are a result of its attempts to adapt familiar joke structures to new subjects, which often resulted in poor results. This experience highlights the importance of developing AI models that can balance creativity and familiarity in their responses.
Source: https://arstechnica.com/ai/2025/08/is-gpt-5-really-worse-than-gpt-4o-ars-puts-them-to-the-test