Mechanism for interpreting complex cell image features driving deep learning classifiers
Deep learning has emerged as the technique of choice for identifying subtle patterns in biomedical data sets, which are meaningful for the classification of the data content. In this July’21 cover story of Cell Systems, Assaf Zaritsky (now Assistant Professor in the Department of Software & Information Systems Engineering at the Ben-Gurion University of the Negev) and Andrew Jamieson, Assistant Professor in our Department, exploited this capacity to build and validate a classifier for melanoma biopsies based on live movies of the dynamics of disaggregated cells. The experimental data was generated by Erik Welf, Andres Nevarez and Justin Cillay, all former lab members, using a series of patient-derived melanoma models established by Sean Morrison’s lab. UT Southwestern’s Communication Office wrote a nice Newsroom summary of the story as well.
In brief, the classifier recognizes based on as few as 20 cells whether the tumor has the potential of becoming highly metastatic or can be consider as less metastatic. However, a major criticism of such deep learning classifiers is that they are “black box”, i.e. it is not transparent what components of the data the classifier relies on in its decision. Our team devised a novel algorithm to “reverse engineer” the key image properties that distinguish highly metastatic from less metastatic cells. These properties are too subtle to be observed by the human eye. More importantly, they are mathematically too complex for a human to implement in a conventional, non-AI driven image classification pipeline. To get a feel for what the decisive properties might be in this AI application the team exploited the capacity of the deep learned model to generate synthetic images of high- and low-metastatic cells that are far outside the natural diversity of experimental data. Through this trick of ‘massive exaggeration’, we found arm-like extensions and increased light scattering as hallmark properties of metastatic cells.
Together, using cell images as a prototypical biomedical data set, these experiments illustrate how AI can support the identification of image properties that are predictive of complex cellular phenotypes and integrated cell functions but are too convoluted and subtle to be identified in the raw imagery by a human observer. The mathematical underpinnings of this Interpretation of Deep Learning is data type agnostic and thus may be applicable in broad diversity of deep learned data classifiers.