Research
My research interest lies in developing state-of-the-art machine learning techniques for Medical Multimodal Understanding (MMU)
with a special focus on tackling challenges of clinical deployment of ml models such as distributional shift, and explainability, enabling robustness and trust of these deployed models.
|
|
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Wisdom O. Ikezogwo ,
Mehmet Saygin Seyfioglu,
Fatemeh Ghezloo ,
Ranjay Krishna and
Linda Shapiro
CVPR June 2024.
Challenges in histopathology: 1) Lack of spatial grounding in image-language datasets; 2) Isolated image-text pairs from PubMed. Solution: Using YouTube content to create a visual instruction tuning dataset. Extracting mouse cursor movements for spatial awareness. Joint training of vision and text encoders yields a superior visual chatbot, surpassing state-of-the-art on histopathology benchmarks.
The gigapixel scale of whole slide images (WSIs) poses a challenge for histopathology multi-modal chatbots, requiring a global WSI analysis for diagnosis, compounding evidence from different WSI patches. Current visual instruction datasets, generated through large language models, focus on creating question/answer pairs for individual image patches, which may lack diagnostic capacity on their own in histopathology, further complicated by the absence of spatial grounding in histopathology image captions.
To bridge this gap, we introduce Quilt-Instruct, a large-scale dataset of 107,131 histopathology-specific instruction question/answer pairs, that is collected by leveraging educational histopathology videos from YouTube, which provides spatial localization of captions by automatically extracting narrators' cursor movements. In addition, we provide contextual reasoning by extracting diagnosis and supporting facts from the entire video content to guide the extrapolative reasoning of GPT-4. Using Quilt-Instruct, we train Quilt-LLaVA, which can reason beyond the given single image patch, enabling diagnostic reasoning and the capability of spatial awareness. To evaluate Quilt-LLaVA, we propose a comprehensive evaluation dataset created from 985 images and 1283 human-generated question-answers directly extracted from videos. We also thoroughly evaluate Quilt-LLaVA using public histopathology datasets, where Quilt-LLaVA significantly outperforms SOTA by over 10% on relative GPT-4 score and 4% and 9% on open and closed set VQA.
|
|
Quilt-1M: One Million Image-Text Pairs for Histopathology.
Wisdom O. Ikezogwo,
Mehmet Saygin Seyfioglu,
Fatemeh Ghezloo, Dylan Geva, Fatwir S. Mohammed, Pavan K. Anand, Ranjay Krishna and Linda Shapiro
NeurIPS (ORAL) June 2023.
Recent accelerations in multi-modal applications have been made possible with the plethora of image and text data available online. However, the scarcity of similar data in the medical field, specifically in histopathology, has slowed similar progress. To enable similar representation learning
for histopathology, we turn to YouTube, an untapped resource of videos, offering 1,087 hours of valuable educational histopathology videos from expert clinicians. From YouTube, we curate Quilt: a large-scale vision-language dataset consisting of 802,148 image and text pairs.
Quilt was automatically
curated using a mixture of models, including large language models, handcrafted algorithms, human knowledge databases, and automatic speech recognition. In comparison, the most comprehensive datasets curated for histopathology amass only around 200K samples. We combine Quilt with datasets,
from other sources, including Twitter, research papers, and the internet in general, to create an even larger dataset: Quilt-1M, with 1M paired image-text samples, marking it as the largest vision-language histopathology dataset to date. We demonstrate the value of Quilt-1M by fine-tuning a
pre-trained CLIP model. Our model outperforms state-of-the-art models on both zero-shot and linear probing tasks for classifying new pathology images across 13 diverse patch-level datasets of 8 different sub-pathologies and cross-modal retrieval tasks.
|
|
Multi-Scale Cross-Attention Multiple Instance Learning Network for
Risk Stratification of Whole Slide Images.
Wisdom O. Ikezogwo,
Christopher Chandler, Jatin S. Gandhi, Annie Garcia, Courtney Daum,
Elizabeth Loggers, Linda G. Shapiro, Jose G. Mantilla, Anshu Bandhlish, Robert W. Ricciotti
Abstract (Oral Platform Presentation)*, United States and Canadian Academy of Pathology (USCAP) April 2023.
|
|
Multi-modal Masked Autoencoders Learn Compositional Histopathological Representations.
W.O. Ikezogwo, Mehmet Saygin Seyfioglu, Linda Shapiro
Extended abstract: Machine Learning for Health (ML4H), Dec 2022.
|
|
Supervised domain generalization for integration of disparate scalp EEG datasets for automatic epileptic seizure detection
K.P. Ayodele, W.O. Ikezogwo, M.A. Komolafe, P. Ogunbona
Computers in Biology and Medicine Volume 120, May 2020, 103757
We use supervised domain generalization to combine disparate EEG datasets and a recurrent convolutional neural network detector to test the generalizability of the trained model on an out-of-distribution private epilepsy seizure dataset.
|
|
Empirical Characterization of the Temporal Dynamics of EEG Spectral Components.
K.P. Ayodele, W.O. Ikezogwo, Anthony A. Osuntunyi
International Journal of Online and Biomedical Engineering (iJOE) Volume 16, Dec 2020.
|
News & Updates
Quilt-LLaVA has been accepted at CVPR 2024 Febuary, 2024
Read here.
Interning at Apple in Spring & Summer of 2024 January, 2024
Read here.
Medical AI Renaissance Proposal Accepted for Microsoft's Accelerating Foundation Model Research Grant. October, 2023
Read here.
Quilt-1M has been accepted at NeurIPS 2023 (ORAL) September, 2023
Read here.
Started Summer Internship at Mayo Clinic (QHS team) June, 2023
Read here.
Presenting MMAE and Junior Chair at ML4H November, 2022
Details.
|