GPT-4 performed close to the level of expert doctors in eye assessments

As learning language models (LLMs) continue to advance, so do questions about how they can benefit society in areas such as the medical field. A recent study from the University of Cambridge’s School of Clinical Medicine found that OpenAI’s GPT-4 performed nearly as well in an ophthalmology assessment as experts in the field, the Financial Times first reported.

In the study, published in PLOS Digital Health, researchers tested the LLM, its predecessor GPT-3.5, Google’s PaLM 2 and Meta’s LLaMA with 87 multiple choice questions. Five expert ophthalmologists, three trainee ophthalmologists and two unspecialized junior doctors received the same mock exam. The questions came from a textbook for trialing trainees on everything from light sensitivity to lesions. The contents aren’t publicly available, so the researchers believe LLMs couldn’t have been trained on them previously. ChatGPT, equipped with GPT-4 or GPT-3.5, was given three chances to answer definitively or its response was marked as null.

GPT-4 scored higher than the trainees and junior doctors, getting 60 of the 87 questions right. While this was significantly higher than the junior doctors’ average of 37 correct answers, it just beat out the three trainees’ average of 59.7. While one expert ophthalmologist only answered 56 questions accurately, the five had an average score of 66.4 right answers, beating the machine. PaLM 2 scored a 49, and GPT-3.5 scored a 42. LLaMa scored the lowest at 28, falling below the junior doctors. Notably, these trials occurred in mid-2023.

While these results have potential benefits, there are also quite a few risks and concerns. Researchers noted that the study offered a limited number of questions, especially in certain categories, meaning the actual results might be varied. LLMs also have a tendency to “hallucinate” or make things up. That’s one thing if its an irrelevant fact but claiming there’s a cataract or cancer is another story. As is the case in many instances of LLM use, the systems also lack nuance, creating further opportunities for inaccuracy.

Source link

virginiadigitalnews.com2 weeks ago

GPT-4 performed close to the level of expert doctors in eye assessments

virginiadigitalnews.com

Easing Marijuana Laws Doesn’t Mean the Drug Is Safer

Dan Schneider Sues Quiet on Set Producers for Defamation

Researchers unlock potential of 2D magnetic devices for future computing

Michael Jordan teamed up with Derek Jeter and Serena Williams in a tequila project

From Connie Chan to Ethan Kurzweil venture capitalists continue to play musical chairs

us weather alert: Tornadoes alert: Kansas, Texas likely to witness thunderstorms

Jerome Powell doesn’t see the ‘stag’ or the ‘-flation’ investors fear

Zillow forecasts slower second quarter, amid caution among first-time buyers and agents

Can’t decide between Bluesky, Mastodon and Nostr? Nootti’s new app lets you post to all three.

Wicked Little Letters review – a flimsy comic farce

Trump backed further into a financial corner after losing control of his company

Maren Morris Is Already Marveling at Beyoncé’s Shift to Country Music

Taylor Swift Donates $100K To Family Of Parade Shooting Victim

Chilli Curves Usher’s Proposal Reveal, Enjoys Trip With Matthew

What does a business analyst do?

That Which We Call a Struggle: A Response to Ife Olujobi’s “$5000”

Microsoft’s OpenAI partnership was born from Google envy

Explore Starfield’s barren planets at 60 fps on Xbox Series X starting this month

PS5 update will let you invite people to multiplayer games through your smartphone’s apps

May’s PlayStation Plus games include Ghostrunner 2 and the modern classic Tunic

Blake Lively & More Support Taylor Swift’s Tortured Poets Album

Herbs, Vitamins, and Supplements Used to Enhance Mood

‘I don’t have a travel buddy’: More older adults are flying solo by choice

Manufacturers show sign of revival as ISM index turns positive for first time in 17 months

Bitcoin address types compared: P2PKH, P2SH, P2WPKH, and more

Can’t decide between Bluesky, Mastodon and Nostr? Nootti’s new app lets you post to all three.

Wicked Little Letters review – a flimsy comic farce

Trump backed further into a financial corner after losing control of his company

Maren Morris Is Already Marveling at Beyoncé’s Shift to Country Music

Taylor Swift Donates $100K To Family Of Parade Shooting Victim

Easing Marijuana Laws Doesn’t Mean the Drug Is Safer

The First Developer Preview of Android 15

That Which We Call a Struggle: A Response to Ife Olujobi’s “$5000”

What does a business analyst do?

Taylor Swift Donates $100K To Family Of Parade Shooting Victim

Chilli Curves Usher’s Proposal Reveal, Enjoys Trip With Matthew

NASCAR's Bubba Wallace and Wife Amanda Expecting First Baby

I bought an air fryer from Costco. Should I now buy gold?

Related Articles