Wolfram Research: Injecting reliability into generative AI

The hype surrounding generative AI and the potential of large language models (LLMs), spearheaded by OpenAI’s ChatGPT, appeared at one stage to be practically insurmountable. It was certainly inescapable. More than one in four dollars invested in US startups this year went to an AI-related company, while OpenAI revealed at its recent developer conference that ChatGPT continues to be one of the fastest-growing services of all time.

Yet something continues to be amiss. Or rather, something amiss continues to be added in.

One of the biggest issues with LLMs are their ability to hallucinate. In other words, it makes things up. Figures vary, but one frequently-cited rate is at 15%-20%. One Google system notched up 27%. This would not be so bad if it did not come across so assertively while doing so. Jon McLoone (left), Director of Technical Communication and Strategy at Wolfram Research, likens it to the ‘loudmouth know-it-all you meet in the pub.’ “He’ll say anything that will make him seem clever,” McLoone tells AI News. “It doesn’t have to be right.”

The truth is, however, that such hallucinations are an inevitability when dealing with LLMs. As McLoone explains, it is all a question of purpose. “I think one of the things people forget, in this idea of the ‘thinking machine’, is that all of these tools are designed with a purpose in mind, and the machinery executes on that purpose,” says McLoone. “And the purpose was not to know the facts.

“The purpose that drove its creation was to be fluid; to say the kinds of things that you would expect a human to say; to be plausible,” McLoone adds. “Saying the right answer, saying the truth, is a very plausible thing, but it’s not a requirement of plausibility.

“So you get these fun things where you can say ‘explain why zebras like to eat cacti’ – and it’s doing its plausibility job,” says McLoone. “It says the kinds of things that might sound right, but of course it’s all nonsense, because it’s just being asked to sound plausible.”

What is needed, therefore, is a kind of intermediary which is able to inject a little objectivity into proceedings – and this is where Wolfram comes in. In March, the company released a ChatGPT plugin, which aims to ‘make ChatGPT smarter by giving it access to powerful computation, accurate math[s], curated knowledge, real-time data and visualisation’. Alongside being a general extension to ChatGPT, the Wolfram plugin can also synthesise code.

“It teaches the LLM to recognise the kinds of things that Wolfram|Alpha might know – our knowledge engine,” McLoone explains. “Our approach on that is completely different. We don’t scrape the web. We have human curators who give the data meaning and structure, and we lay computation on that to synthesise new knowledge, so you can ask questions of data. We’ve got a few thousand data sets built into that.”

Wolfram has always been on the side of computational technology, with McLoone, who describes himself as a ‘lifelong computation person’, having been with the company for almost 32 of its 36-year history. When it comes to AI, Wolfram therefore sits on the symbolic side of the fence, which suits logical reasoning use cases, rather than statistical AI, which suits pattern recognition and object classification.

The two systems appear directly opposed, but with more commonality than you may think. “Where I see it, [approaches to AI] all share something in common, which is all about using the machinery of computation to automate knowledge,” says McLoone. “What’s changed over that time is the concept of at what level you’re automating knowledge.

“The good old fashioned AI world of computation is humans coming up with the rules of behaviour, and then the machine is automating the execution of those rules,” adds McLoone. “So in the same way that the stick extends the caveman’s reach, the computer extends the brain’s ability to do these things, but we’re still solving the problem beforehand.

“With generative AI, it’s no longer saying ‘let’s focus on a problem and discover the rules of the problem.’ We’re now starting to say, ‘let’s just discover the rules for the world’, and then you’ve got a model that you can try and apply to different problems rather than specific ones.

“So as the automation has gone higher up the intellectual spectrum, the things have become more general, but in the end, it’s all just executing rules,” says McLoone.

What’s more, as the differing approaches to AI share a common goal, so do the companies on either side. As OpenAI was building out its plugin architecture, Wolfram was asked to be one of the first providers. “As the LLM revolution started, we started doing a bunch of analysis on what they were really capable of,” explains McLoone. “And then, as we came to this understanding of what the strengths or weaknesses were, it was about that point that OpenAI were starting to work on their plugin architecture.

“They approached us early on, because they had a little bit longer to think about this than us, since they’d seen it coming for two years,” McLoone adds. “They understood exactly this issue themselves already.”

McLoone will be demonstrating the plugin with examples at the upcoming AI & Big Data Expo Global event in London on November 30-December 1, where he is speaking. Yet he is keen to stress that there are more varied use cases out there which can benefit from the combination of ChatGPT’s mastery of unstructured language and Wolfram’s mastery of computational mathematics.

One such example is performing data science on unstructured GP medical records. This ranges from correcting peculiar transcriptions on the LLM side – replacing ‘peacemaker’ with ‘pacemaker’ as one example – to using old-fashioned computation and looking for correlations within the data. “We’re focused on chat, because it’s the most amazing thing at the moment that we can talk to a computer. But the LLM is not just about chat,” says McLoone. “They’re really great with unstructured data.”

How does McLoone see LLMs developing in the coming years? There will be various incremental improvements, and training best practices will see better results, not to mention potentially greater speed with hardware acceleration. “Where the big money goes, the architectures follow,” McLoone notes. A sea-change on the scale of the last 12 months, however, can likely be ruled out. Partly because of crippling compute costs, but also because we may have peaked in terms of training sets. If copyright rulings go against LLM providers, then training sets will shrink going forward.

The reliability problem for LLMs, however, will be forefront in McLoone’s presentation. “Things that are computational are where it’s absolutely at its weakest, it can’t really follow rules beyond really basic things,” he explains. “For anything where you’re synthesising new knowledge, or computing with data-oriented things as opposed to story-oriented things, computation really is the way still to do that.”

Yet while responses may vary – one has to account for ChatGPT’s degree of randomness after all – the combination seems to be working, so long as you give the LLM strong instructions. “I don’t know if I’ve ever seen [an LLM] actually override a fact I’ve given it,” says McLoone. “When you’re putting it in charge of the plugin, it often thinks ‘I don’t think I’ll bother calling Wolfram for this, I know the answer’, and it will make something up.

“So if it’s in charge you have to give really strong prompt engineering,” he adds. “Say ‘always use the tool if it’s anything to do with this, don’t try and go it alone’. But when it’s the other way around – when computation generates the knowledge and injects it into the LLM – I’ve never seen it ignore the facts.

“It’s just like the loudmouth guy at the pub – if you whisper the facts in his ear, he’ll happily take credit for them.”

Wolfram will be at the AI & Big Data Expo. Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with Cyber Security & Cloud Expo and Digital Transformation Week.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

The post Wolfram Research: Injecting reliability into generative AI appeared first on AI News.