Md Abdullah Al Mamun
School of Computing and Information Sciences
Abdullah Al Mamun is a Ph.D. candidate in the School of Computing and Information Sciences (SCIS) at Florida International University (FIU), under the supervision of Dr. Ananda Mondal. He is a part of the Machine Learning and Data Analytics Group (MLDAG), as well as the Bioinformatics Research Group (BioRG). His research interests are at the intersection of Machine Learning (ML), Data Science, and Computational Biology. Abdullah has an M.S. in Computer Engineering from King Fahd University of Petroleum and Minerals, KSA, and a B.S. in Computer Science and Engineering from Dhaka University of Engineering and Technology, Bangladesh.
Cancer is a multi-omics molecular process combining abnormal gene expression, DNA methylation, histone modifications, and non-coding RNA dysregulations. One can find the molecular biomarkers for a cancer either from each omics data separately or by integrating multi-omics data. The omics data are high-dimensional. For example, the human genome has about 20K (20,000) coding genes, 40K non-coding genes, and 450K DNA methylated sites. Thus, to represent a human genome or an individual human being, we need 20K, 40K, and 450K dimensions (or features) with respect to coding genes, non-coding genes, and DNA methylation sites, respectively. A small subset of these molecules or features is responsible for a cancer. Any dataset with N-number of features has 2^Npossible subset of features. In the presence of such a large number of possible combinations, finding the best subset of N features related to causing cancer is computationally challenging and expensive.
For my dissertation, I propose three significant thrusts of work with the ultimate aim of developing a machine learning-based feature selection framework for identifying molecular biomarkers from multi-omics cancer data. First, I propose to build a feature selection framework that can identify cancer-related biomarker molecules from single omics data, which can differentiate 33 different types of cancer available in The Cancer Genome Atlas. This framework cannot provide information about which features contribute to which cancer. Second, I propose updating the framework capable of identifying cancer type-specific or subtype-specific features and performing classification at the same time. Finally, the integration of multi-omics data into the proposed framework will provide a comprehensive and stable set of biomarker molecules for characterizing different types of cancer and subtypes of specific cancer. The proposed framework will help develop new screening tools and targeted therapy for cancers, which will contribute to the development of precision medicine.