Medical Large Language Models Are Available Now And More Accurate Than General-Purpose LLM's
Large language models (LLM’s) unlock new use cases in healthcare NLP. John Snow Labs has launched new Healthcare NLP models for accurate and production-ready healthcare use cases.
Large language models (LLM’s) unlock new use cases in healthcare NLP, so as part of our commitment to always keep you at the state of the art, the latest 4.4 release of John Snow Labs’ Healthcare NLP includes a suite of new LLM’s that are healthcare specific, highly accurate, and production ready. Here’s what you need to know:
1. They cover a range of common healthcare use cases
Ask medical questions: Try asking the new BioGPT-JSL (the first ever closed-book medical question answering LLM based on BioGPT) “how to treat asthma”.
Understand medical research: Give the MedicalQuestionAnswering annotator a PubMed abstract and ask it what the key results were.
Generate clinical text: Prompt the MedicalTextGenerator annotator to complete “66yo male patient presents with severe back pain and …”.
Summarize clinical encounters: Ask the MedicalSummarizer annotator to turn a visit summary, discharge note, radiology report, or pathology reports into one paragraph.
Summarize questions from patients: With 5 models for 5 contexts, MedicalSummarizer can also turn an email or post from a patient into a one-sentence question.
2. They’re more accurate than general-purpose LLM’s.
Clinical note summarization is 30% more accurate than general state-of-the-art LLMs (BART, Flan-T5, Pegasus).
On clinical entity recognition, our models make half of the errors that ChatGPT does.
De-Identification out-of-the-box accuracy is 93% compared to ChatGPT’s 60% on detecting PHI in clinical notes.
Extracting ICD-10-CM codes is done with a 76% success rate versus 26% for GPT-3.5 and 36% for GPT-4.
It should come as no surprise that models trained with domain-specific data & experts outperform general-purpose models. We’re happy to share the Python notebooks if you need to reproduce or customize the benchmarks.
3.. They’re production ready.
Runs on your infrastructure, behind your firewall, under your security controls. No text is ever sent to any third party or cloud service.
No need to buy a shipload of GPU’s. We’ve engineered these LLM’s to run on commodity hardware, which makes them both much faster and much cheaper to scale.
Regularly updated. LLM’s are regularly tuned as new research papers, clinical trials, guidelines and terminologies are published. Never go to production with a stale model.
Most importantly, models will be frequently rebuilt: We’ll keep rebuilding as research evolves. Because only one thing is certain about today’s state-of-the-art LLM’s: If you train one today, it will be outdated in 3-6 months.
If you’re a John Snow Labs customer, all these capabilities are included in your Healthcare NLP subscription. Install the new 4.4 release and give it a go. If you’d like to learn more, join the next webinar on automated summarization of clinical notes on April 26th.
Hello David! Outstanding article, I appreciate you taking the time to share this.
"Clinical note summarization achieves 30% more accuracy than general state-of-the-art Language Model architectures, such as BART, Flan-T5, and Pegasus"
Could you kindly expand on the specific metrics employed in this context to quantify and compare accuracy?