Md Abdullah Al Mamun
Florida International University
Md Abdullah Al Mamun is a Ph.D. candidate in the Knight Foundation School of Computing and Information Sciences (KFSCIS) at Florida International University (FIU), under the supervision of Dr. Ananda Mondal. He is a part of the Machine Learning and Data Analytics Group (MLDAG), as well as the Bioinformatics Research Group (BioRG). His research interests are at the intersection of Machine Learning (ML), Data Science, and Computational Biology. He is the first author of five research articles published in peer reviewed top journal and computational biology conferences such as IEEE International Conference on Bioinformatics and Biomedicine (BIBM), ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB). Abdullah has a Master’s degree in Computer Engineering from King Fahd University of Petroleum and Minerals, KSA, and a B.S. degree in Computer Science and Engineering from Dhaka University of Engineering and Technology, Bangladesh.
Cancer is a complex molecular process due to abnormal changes in the genome, such as mutation and copy number variation, and epigenetic aberrations such as dysregulations of long non-coding RNA (lncRNA). These abnormal changes are reflected in transcriptome by turning oncogenes on and tumor suppressor genes off, which are considered cancer biomarkers.
However, transcriptomic data is high dimensional, and finding the best subset of genes (features) related to causing cancer is computationally challenging and expensive. Thus, developing a feature selection framework to discover molecular biomarkers for cancer is critical.
Traditional approaches for biomarker discovery calculate the fold change for each gene, comparing expression profiles between tumor and healthy samples, thus failing to capture the combined effect of the whole gene set. Also, these approaches do not always investigate cancer-type prediction capabilities using discovered biomarkers.
In this work, we proposed a machine learning-based framework to address all of the above challenges in discovering lncRNA biomarkers. First, we developed a machine learning pipeline that takes lncRNA expression profiles of cancer samples as input and outputs a small set of key lncRNAs that can accurately predict multiple cancer types. A significant innovation of our work is its ability to identify biomarkers without using healthy samples. However, this initial framework cannot identify cancer-specific lncRNAs. Second, we extended our framework to identify cancer type and subtype-specific lncRNAs. Third, we proposed to use a state-of-the-art deep learning algorithm concrete autoencoder (CAE) in an unsupervised setting, which efficiently identifies a subset of the most informative features. However, CAE does not identify reproducible features in different runs due to its stochastic nature. Thus, we proposed a multi-run CAE (mrCAE) to identify a stable set of features to address this issue. Our deep learning-based pipeline significantly extended the previous state-of-the-art feature selection techniques.
Finally, we showed that discovered biomarkers are biologically relevant using literature review and prognostically significant using survival analyses. The discovered novel biomarkers could be used as a screening tool for different cancer diagnoses and as therapeutic targets.