Can AI do Math? Spoiler: Yes! But only if you use these tools

GPT-4 has beaten almost every standardized test, ranging from sommelier to biology. AI can now perform at the top of most university classes, but it can still struggle with reliability in basic math. The issue, however, is that the AI is being asked to do things that don't come naturally to it.

In a new research paper with the beautifully poetic name, Faith and Fate, researcher analyze GPT4's performance on 3 digit multiplication. 3 digit meaning 254 x 234 for instance. That's pretty tough to do in your head, but people can learn a set of simple rules to do the calculations themselves. GPT4 struggles though, with an accuracy of just 59%. Intuitively that seems strange. How is it that AI can be so good at much more complex tasks but fail at something a simple calcualtor can do?

It's worth remembering what GPT-4 actually is: a language model. This means it is exceptional at predicting the next best word in a sentence or paragraph. However, as we know, that's not how any of us do math. When we perform calculations we take into account several factors simultaneously. We incorporate various frameworks, which may be algebraic or graphical, and then combine them to generate a calculation that we "execute". For a 3 digit calculation we have rules that we can apply. GPT-4, used as a language model, simply doesn't do that... but it can!

Chaining is the first step

The most crucial initial step in addressing a complex question is getting the AI to break it down into steps. The AI can do this either by being asked to or by being configured as an Agent. An agent can be designed to always approach a problem by separating it into parts. It can also "sense check" prior steps, which dramatically improves performance.

Code is the second step

Teamwork makes the dream work! AI agents that define calculations as code are significantly more accurate at analysis because the calculation itself is done entirely through code execution. Writing code is something GPT-4 excels at. In addition, most "inaccuracies" result in code that fails and can be re-run by an agent, rather than poor-quality analysis. This is where Code Interpreter and ThinkChain's Analyst agent come into play.


Asking ThinkChain to calculate 237 x 757, it comes back with the answer 179,349. However, ThinkChain also has access to an expert, the Analyst agent. Asking the analyst agent the same question leads to a more accurate, code based analysis, and the result is correct!

ThinkChain's Analyst agent is capable of analyzing a question semantically, thinking through it step by step, and performing the analysis through the execution of code. ThinkChain also provides a fully auditable trail to analyze the steps taken and the outputs rendered. Through these steps, a complete and accurate analysis can be completed in record time.


Get started now to learn more!


Written with the help of ThinkChain.ai