ChatGPT, a mass-market artificial intelligence chatbot launched by OpenAI last year, passed the bar exam and the medical license exam that typically require human students years of intensive study and postsecondary education to complete.
The language processing tool has gained widespread recognition over the past several weeks as knowledge workers leverage the user-friendly system to complete tasks such as writing emails and debugging code in a matter of moments. Academics have successfully applied the system to exams often considered difficult by even the world’s brightest students.
ChatGPT performed “at or near the passing threshold” for all three components of the United States Medical Licensing Exam, a test which physicians holding Doctor of Medicine degrees must pass for medical licensure, without “any specialized training or reinforcement,” according to one research paper. The system also showed “a high level of concordance and insight in its explanations,” implying that “large language models may have the potential to assist with medical education, and potentially, clinical decision-making.”
The researchers fed ChatGPT open-ended and multiple choice questions with and without forced explanations; two physician adjudicators scored the responses with respect to accuracy, concordance, and insight. The performance of ChatGPT on the exam significantly exceeded scores earned by other artificial intelligence systems mere months earlier. ChatGPT also outperformed PubMedGPT, which is “trained exclusively on biomedical domain literature,” and landed “comfortably within the passing range” of scores.
The system also earned passing scores on the multistate multiple choice section of the Bar Exam, according to another research paper. Humans with seven years of postsecondary education and exam-specific training only answered 68% of questions correctly; ChatGPT achieved a correct rate of 50.3%, while the model’s top two and top three choices were right 71% and 88% of the time, far exceeding the baseline guessing rate.
The researchers concluded that ChatGPT “significantly exceeds our expectations for performance on this task” and noted that the rank-ordering of possible choices confirms the “general understanding of the legal domain” reflected by the system.
Although conversations surrounding technological unemployment over the past several decades have revolved around blue-collar workers losing their positions to automated robotics solutions, the widespread use of ChatGPT has introduced similar questions in white-collar professions. Many knowledge workers nevertheless find that the system increases their efficiency: some 27% of professionals at prominent consulting, technology, and financial services companies have already used ChatGPT in various capacities, according to a survey from Fishbowl.
The studies related to difficult medical and legal licensure exams follow a similar project which examined the performance of ChatGPT on a graduate-level operations management test at the University of Pennsylvania’s Wharton School. Professor Christian Terwiesch said that ChatGPT earned a grade between B and B- on a final exam usually presented to MBA students.
“It does an amazing job at basic operations management and process analysis questions including those that are based on case studies,” he wrote. “Not only are the answers correct, but the explanations are excellent.”
Terwiesch clarified that the performance from ChatGPT still had some salient deficiencies. The system made “surprising mistakes in relatively simple calculations” at the level of sixth-grade math that were often “massive in magnitude,” while the current version of the system “is not capable of handling more advanced process analysis questions, even when they are based on fairly standard templates.”