OpenAI Unleashes New AI Model GPT-4, Which Can Pass Academic Exams, Program Software, And Even Do Taxes
Hologram of the artificial intelligence robot showing up from binary code.
(Yuichiro Chino/Getty Images)

Artificial intelligence software development firm OpenAI released GPT-4, its latest AI language model, with a massive array of new capabilities.

In a press release announcing the rollout of GPT-4 on Tuesday, OpenAI claimed that while GPT-4 still lags behind human beings in real-world scenarios, the AI can excel at theoretical and academic applications. In a developer livestream, the company showcased the software’s powerful problem-solving and image recognition, describing images, creating a working website, and even doing simulated taxes.

The first thing OpenAI discussed in its release was the problem-solving improvements made between GPT-4 and its predecessor, GPT-3.5. To illustrate these new capabilities, OpenAI showed a table of academic and professional exams, and the scores the software garnered. The AI scored:

  • A 298/400 on the Unified Bar Exam, which was in the 90th percentile of results.
  • A 163 on the LSAT, in the 88th percentile.
  • A 710 on the reading and writing SAT, the 93rd percentile
  • A 700 on the math SAT, the 89th percentile
  • A 169 on the verbal GRE, in the 99th percentile
  • A 5 on the AP Art History, Biology, Macro- and Microeconomics, Psychology, Statistics, US Government, and US History exams

In the developer livestream, OpenAI President Greg Brockman discussed several new features the updated software has. First, GPT-4 has a new system prompt in the user interface that allows the user to input new parameters for the AI to work with so that it can refine its model. Brockman demonstrated this capability with some basic prompts, including summarizing the OpenAI press release into a sentence where each word begins with G. While GPT-3.5 effectively gave up on the assignment, GPT-4 synthesized the article into the sentence: “GPT-4 generates groundbreaking, grandiose gains, greatly galvanizing generalized AI goals.”

When Brockman pointed out that “AI doesn’t count,” GPT-4 created a new sentence: “Gigantic GPT-4 garners groundbreaking growth, greatly galvanizing global goals.” The software was able to create similar sentences using only A’s and even Q’s.

Next, Brockman experimented with GPT-4’s “vision model.” The AI built a Discord chat bot that could analyze and describe images posted to the chat server. Brockman then prompted the bot to describe a screenshot of the Discord channel, and the bot responded with a detailed description of the image, including the Discord layout and messages posted into the chat. The bot was also able to describe another image of a snowboarder on an alien planet, and a cartoon of a squirrel holding a camera.

Brockman then uploaded a photograph of a hand-drawn joke website. The AI-built Discord bot was able to recognize Brockman’s drawing, then write Javascript code for a working website with jokes and a button to push to reveal the punchline.

Finally, Brockman showed that GPT-4 was able to do simulated taxes. Using a system prompt he dubbed “TaxGPT,” and a prompt that included large parts of the federal tax code, he asked ChatGPT to estimate 2018 taxes for a married couple with one child. The software was able to reason out the answers using the tax code, and came up with the family’s standard deduction and estimated tax liability.

The model is still not in at its full potential, OpenAI noted. According to the press release, system messages are the easiest way to “jailbreak” the AI from its boundaries, like the infamous viral “DAN” instance; the model also still “hallucinates,” making up facts that don’t exist, and makes reasoning errors. The company is also working with experts to reduce “harmful advice, buggy code, or inaccurate information,” it said.

