Generative Modeling for Biomedical Applications

Label-free high-throughput digital holographic microscopy is a novel imaging technique developed to aid clinicians and pathologists in their diagnostics of various diseases. It requires little sample preparation, is automated and allows in-vivo diagnostics of a large amount of living cells and live tissue in a very short amount of time. But as the technique is label-free i.e., the cells are not immunohistochemically stained, and the cells are still alive during the measurement, suspended in motion in a fluidic channel, their appearance differs from those in medical textbooks. No ground truth labels exist, and a lot of the morphological features pathologists normally search for are not available. The analysis of this new type of high-dimensional data is where the idea of applying machine learning and concepts from signal processing comes into play.

In the field of single cell image analysis, promising results have been shown already by using basic feature extraction methods on microscopic images and then using the extracted features for cell classification to diagnose various types of diseases.

Our research will focus on developing methods exploiting the whole potential of the new type of data, instead of a limited set of features. The idea is to approach the problem from the perspective of statistical pattern recognition and to train deep neural networks to estimate the underlying distributions of the data.

One area, we will explore to this end, are generative models. Generative models offer a way to estimate the unknown distribution of natural data by learning to generate data which largely resembles the original samples. A generative model, if training is successful, learns an estimate of the true distribution of a dataset, and inherently can generate new and realistic, but so far unseen samples. Generative models can be trained in an unsupervised manner, they do not necessarily require labeling.

Once an estimate of the true data distribution of the data can be found, it can then be used to identify abnormalities in the samples, identify commonalities between samples or single cells, allow the classification of unlabeled cells and search for new diagnostic markers to aid the diagnostic process.