The next seminar will be held on the 27th November, University of Cambridge and will feature two talks:
Title: “Turn and face the strange: Out-of-distribution generalisation in machine learning” – Dr. Agnieszka Słowik, Microsoft Research Cambridge
When applied to a new data distribution, machine learning algorithms have been shown to deteriorate. Distribution shifts are caused by spurious correlations that hold at training time but not at test time, changes to the domain, as well as under- and over-representation of certain populations in training data. In this talk, I present two studies in the setting of learning from multiple data sources. In the first study, On Distributionally Robust Optimization and Data Rebalancing, multiple data sources are used to minimise the error on the most challenging data source. In the second study, Linear unit-tests for invariance discovery, I present a set of ‘unit tests’ that validate whether a given algorithm ignores spurious, unstable features that are unlikely to hold in the future, while learning the features that hold across all sources of training data. I conclude with a discussion of potential applications of this research to AI in medicine.
Title: “Development of a Natural Language Processing Multilingual Model for Summarizing Radiology Reports” – Mariana Lindo, Critical Techworks
The impression section of a radiology report summarizes important radiology findings and plays a critical role in communicating these findings to physicians. However, the preparation of these summaries is time-consuming and error-prone for radiologists. Recently, numerous models for radiology report summarization have been developed. Nevertheless, there is currently no model that can summarize these reports in multiple languages. Such a model could greatly improve future research and the development of Deep Learning models that incorporate data from patients with different ethnic backgrounds. In this study, the generation of radiology impressions in different languages was automated by fine-tuning a model, publicly available, based on a multilingual text-to-text Transformer to summarize findings available in English, Portuguese, and German radiology reports. In a blind test, two board-certified radiologists indicated that for at least 70% of the system-generated summaries, the quality matched or exceeded the corresponding human-written summaries, suggesting substantial clinical reliability. Furthermore, this study showed that the multilingual model outperformed other models that specialized in summarizing radiology reports in only one language, as well as models that were not specifically designed for summarizing radiology reports, such as ChatGPT.