Despite growing unease over generative AI’s potential to wreak havoc upon global labor markets, accountants may be able to breathe a temporary sigh of relief.
Adept at storytelling, behavioral learning and other creative tasks, artificial intelligence language model ChatGPT – the fastest-growing and most prominent AI platform to date – has raised concerns over its capacity to help students cheat on coursework and exam material. The bot has passed the bar exam with a score in the 90th percentile, passed 13 of 15 AP exams, and attained a near-perfect score on the GRE.
“When this technology first came out, everyone was worried that students could now use it to cheat,” Brigham Young University accounting professor David Wood noted. “But opportunities to cheat have always existed. So for us, we’re trying to focus on what we can do with this technology now that we couldn’t do before to improve the teaching process for faculty and the learning process for students. Testing it out was eye-opening.”
However, as a study led by Wood later found, the platform often struggles to understand mathematical processes, and often embellishes data to cover up mistakes when they occur.
Wood’s study sought to test ChatGPT’s aptitude in completing accounting exams in comparison to actual accounting students. 25,181 questions on information systems, auditing, financial accounting, managerial accounting, and taxes were submitted from 186 educational institutions in 14 countries. 2,268 more textbook test bank questions were also fed into the study’s repository by undergrad students at BYU.
The questions were presented in different formats with varying levels of difficulty, with a combination of multiple choice, true/false, and written response prompts being employed.
The study found that the students scored higher than ChatGPT, outpacing the chatbot by over 30% – scoring an average of 76.7% to ChatGPT’s 47.4%.
ChatGPT only outperformed the students on 11.3% of the questions, particularly on those concerning auditing and accounting information systems. The chatbot was also more adept at answering multiple choice and true/false questions, scoring 59.5% and 68.7% on each respective format – however, it underperformed significantly on short answer questions, only scoring between 28.7% and 39.1%.
“It’s not perfect; you’re not going to be using it for everything,” Jessica Wood, a BYU freshman who participated in the study, said in a release. “Trying to learn solely by using ChatGPT is a fool’s errand.”