Discriminative Representations for Heterogeneous Images and Multimodal Data


Histology images of tumor tissue are an important diagnostic and prognostic tool for pathologists. Recently developed molecular methods group tumors into subtypes to further guide treatment decisions, but they are not routinely performed on all patients. A lower cost and repeatable method to predict tumor subtypes from histology could bring benefits to more cancer patients. Further, combining imaging and genomic data types provides a more complete view of the tumor and may improve prognostication and treatment decisions. While molecular and genomic methods capture the state of a small sample of tumor, histological image analysis provides a spatial view and can identify multiple subtypes in a single tumor. This intra-tumor heterogeneity has yet to be fully understood and its quantification may lead to future insights into tumor progression. In this work, I develop methods to learn appropriate features directly from images using dictionary learning or deep learning. I use multiple instance learning to account for intra-tumor variations in subtype during training, improving subtype predictions and providing insights into tumor heterogeneity. I also integrate image and genomic features to learn a projection to a shared space that is also discriminative. This method can be used for cross-modal classification or to improve predictions from images by also learning from genomic data during training, even if only image data is available at test time.