RP21 - Multimodal Foundation Models for Medical Images and Text

When diagnosing and treating patients, multimodal information is considered. Clinicians integrate information from a wide range of data modalities including clinical notes, laboratory tests, medical images from pathology, radiology, and nuclear medicine, genomic data, and various other sources. Despite significant progress in the medical AI community, most models today are narrow models restricted to a single task.

The rise of foundation models offers a new perspective on building AI systems in the medical domain [1]. These models, often trained on large amounts of data using self-supervised or unsupervised approaches, can be easily adapted to a variety of tasks through in-context learning or few-shot fine-tuning. In addition, they possess remarkable generative capabilities that can enhance the interplay between humans and AI. These developments create the potential for a single, generalist medical AI framework that can process and reason with a variety of multimodal data and thus, can be employed for multiple tasks.

In this project, we will use state-of-the-art foundation models and extend them to handle multimodal data. For this purpose, we will develop unified training strategies. The use of our KITE (KI Translation Essen) infrastructure provides sufficient computational resources for training and inference [2]. To evaluate the performance of the models, we will compile a set of downstream clinical tasks inspired by requirements arising from tumor conferences where experts from different specialities discuss clinical cases of melanoma patients. These tasks will be published as a multimodal benchmark. In addition, the results of this project will be made available at the Point of Care using an interactive dashboard realized with modern web technologies [3]. It will be possible to interact with the model via a chatbot, using both natural language and textual input. Specifically, the model will be able to analyze whole slide images (WSI) from pathology and CT scans from radiology, as well as summarize patient histories given textual and tabular data received from SHIP, the smart hospital information system established at our clinic. For the development and training of the models with different data modalities, we can build on previous work where we developed deep learning architectures for whole-body CT segmentation [4], cell segmentation and detection in WSIs [5], and processing of German text [6], [7].

[1] Moor, M., Banerjee, O., Abad, Z. S. H., Krumholz, H. M., Leskovec, J., Topol, E. J., & Rajpurkar, P. (2023). Foundation models for generalist medical artificial intelligence. Nature, 616(7956), Article 7956. https://doi.org/10.1038/s41586-023-05881-4

[2] https://kite.ikim.nrw/

[3] Brehmer, A., Sauer, C. M., Salazar, J., Hermann, K., Kim, M., Keyl, J., Bahnsen, F. H., Frank, B., Köhrmann, M., Rassaf, T., Mahabadi, A.-A., Hadaschik, B., Darr, C., Herrmann, K., Tan, S., Buer, J., Brenner, T., Reinhardt, H. C., Nensa, F., … Kleesiek, J. (2023). Establishing Medical Intelligence—Leveraging FHIR to Improve Clinical Management (SSRN Scholarly Paper 4493924). https://doi.org/10.2139/ssrn.4493924

[4] Jaus, A., Seibold, C., Hermann, K., Walter, A., Giske, K., Haubold, J., Kleesiek, J., & Stiefelhagen, R. (2023). Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines (arXiv:2307.13375). arXiv. http://arxiv.org/abs/2307.13375

[5] Hörst, F., Rempe, M., Heine, L., Seibold, C., Keyl, J., Baldini, G., Ugurel, S., Siveke, J., Grünwald, B., Egger, J., & Kleesiek, J. (2023). CellViT: Vision Transformers for Precise Cell Segmentation and Classification (arXiv:2306.15350). arXiv. https://doi.org/10.48550/arXiv.2306.15350

[6] https://huggingface.co/ikim-uk-essen

[7] Dada, A., Ufer, T. L., Kim, M., Hasin, M., Spieker, N., Forsting, M., Nensa, F., Egger, J., & Kleesiek, J. (2023). Information extraction from weakly structured radiological reports with natural language queries. European Radiology. https://doi.org/10.1007/s00330-023-09977-3