“AI passes the US medical licensing exam.” “ChatGPT passes law school exams despite “mediocre” performance.” “Is ChatGPT Getting a Wharton MBA?”
Headlines like this have recently touted (and often exaggerated) the performance of ChatGPT, an artificial intelligence tool that can write sophisticated text responses to human queries. These achievements follow a long tradition of comparing an AI’s skills to human experts, such as Deep Blue’s 1997 chess victory over Gary Kasparov, IBM Watson’s “Jeopardy!” defeating Ken Jennings and Brad Rutter in 2011 and AlphaGo’s victory in the game Go over Lee Sedol in 2016.
The implicit subtext of these recent headlines is more alarming: AI is coming for your work. It’s as smart as your doctor, your attorney, and the consultant you hire. It signals a looming, ever-present disruption in our lives.
But sensationalism aside, comparing AI to human performance tells us everything practical usable? How should we effectively use an AI that passes the US medical licensing exam? Can it reliably and securely collect medical histories during patient admissions? How about a second opinion on a diagnosis? Such questions cannot be answered by doing similar to a human in the medical licensing test.
The problem is that most people have little knowledge of AI – an understanding of when and how to use AI tools effectively. What we need is a simple, common framework for assessing the strengths and weaknesses of AI tools that everyone can use. Only then can the public make informed decisions about incorporating these tools into our daily lives.
To meet this need, my research group turned to an old educational idea: Bloom’s taxonomy. Bloom’s taxonomy, first published in 1956 and later revised in 2001, is a hierarchy describing levels of thinking, with higher levels representing more complex thinking. The six levels are: 1) Remember – remember basic facts, 2) Understand – explain concepts, 3) Apply – use information in new situations, 4) Analyze – make connections between ideas, 5) Evaluate – criticize or justify a decision or opinion, and 6) create – produce original work.
These six levels are intuitive even for non-experts, but specific enough to make meaningful judgments. Moreover, Bloom’s taxonomy is not tied to any particular technology – it applies to cognition in general. We can use it to evaluate the strengths and limitations of ChatGPT or other AI tools that manipulate images, generate sounds, or control drones.
My research group began evaluating ChatGPT through the lens of Bloom’s taxonomy, asking it to respond to variations of a prompt, each targeting a different level of insight.
For example, we asked the AI: “Suppose demand for COVID vaccines this winter is forecast at 1 million doses plus or minus 300,000 doses. How much stock should we have to meet 95% of the demand?” — An application task. Then we changed the question and asked them to “discuss the pros and cons of ordering 1.8 million vaccines” — a rating-level task. We then compared the quality of the two answers and repeated this exercise for all six levels of the taxonomy.
Preliminary results are instructive. ChatGPT is generally good for retrieving, understanding, and matching tasks, but struggles with the more complex analysis and scoring tasks. ChatGPT responded well to the first command usage And to explain a formula that represents an appropriate amount of vaccine (albeit with a slight miscalculation).
In the second, however, ChatGPT fluctuated unconvincingly between too much or too little vaccine. It has not assessed these risks quantitatively, has not considered the logistical challenges of cold storage of such a huge quantity, and has not warned of the possibility that a vaccine-resistant variant could emerge.
We see similar behavior for different commands at these taxonomy levels. Therefore, Bloom’s taxonomy allows us to make more nuanced assessments of AI technology than a pure human-AI comparison.
As for us doctors, lawyers, and consultants, Bloom’s taxonomy also offers a more nuanced view of how AI could one day transform, not replace, these professions. While AI can perform excellent memory and comprehension tasks, few people consult their doctors to list all the possible symptoms of a disease, or ask their lawyer to recite the case law verbatim, or hire a consultant to explain Porter’s five forces theory. to lay.
But we turn to experts in higher-level cognitive tasks. We value our physician’s clinical judgment in weighing the benefits and risks of a treatment plan, our attorney’s ability to summarize precedents and advocate for us, and an advisor’s ability to provide an out-of- the-box solution that no one else would have thought of. These skills are analysis, assessment and creation tasks, levels of understanding where AI technology currently falls short.
Using the Bloom taxonomy, we can see that effective human-AI collaboration largely comes down to delegating lower-level cognitive tasks so that we can focus our energies on more complex, cognitive tasks. So instead of asking whether an AI can compete with a human expert, let’s ask how well an AI’s capabilities can be used to enhance human critical thinking, judgment, and creativity.
Of course, Bloom’s taxonomy has its own limitations. Many complex tasks involve multiple levels of taxonomy, which frustrates attempts at categorization. And Bloom’s taxonomy doesn’t directly address issues like prejudice or racism, a major problem with large-scale AI applications. Although imperfect, Bloom’s taxonomy remains useful. It’s easy for anyone to understand, general enough to apply to a wide variety of AI tools, and structured enough to ensure we’re asking those tools a consistent and thorough set of questions.
Just as the rise of social media and fake news forces us to develop better media literacy, tools like ChatGPT require us to develop our AI literacy. Bloom’s taxonomy offers a way to think about what AI can and cannot do as this kind of technology becomes embedded in more and more areas of our lives.
Vishal Gupta is an associate professor of Data Sciences and Operations at the USC Marshall School of Business and a courtesy position in the Department of Industrial and Systems Engineering.
Source: LA Times

Andrew Dwight is an author and economy journalist who writes for 24 News Globe. He has a deep understanding of financial markets and a passion for analyzing economic trends and news. With a talent for breaking down complex economic concepts into easily understandable terms, Andrew has become a respected voice in the field of economics journalism.