14 October 2024

Large Artificial Intelligence Language Models, Increasingly Unreliable


According to José Hernández Orallo, a researcher at the Valencian Institute for Research in Artificial Intelligence (VRAIN) of the UPV and ValgrAI, one of the main concerns about the reliability of language models is that their performance does not match the human perception of task difficulty.

In other words, there is a mismatch between expectations that the models will fail based on human perception of task difficulty and the tasks on which the models fail. ‘Models can solve certain complex tasks in line with human abilities, but at the same time, they fail on simple tasks in the same domain. For example, they can solve several PhD-level mathematical problems. Still, they can get a simple addition wrong,’ notes Hernández-Orallo.

In 2022, Ilya Sutskever, the scientist behind some of the most significant advances in artificial intelligence in recent years (from the Imagenet solution to AlphaGo) and co-founder of OpenAI, predicted that ‘maybe over time that discrepancy will diminish’.

However, the study by the UPV, ValgrAI and Cambridge University team shows this has not been the case. To demonstrate this, they investigated three key aspects that affect the reliability of language models from a human perspective.

No comments: