Raihanul Bari Tanvir
Florida International University
Raihanul Bari Tanvir is a Ph.D. candidate at the Knight Foundation School of Computing and Information Sciences (KFSCIS) at Florida International University (FIU). He has been working as a Graduate Research Assistant under the supervision of Dr. Ananda M. Mondal and Dr. Giri Narasimhan. He is a part of the Machine Learning and Data Analytics Group (MLDAG), as well as the Bioinformatics Research Group (BioRG). His research interests are at the intersection of Machine Learning, Data Science, and Computational Biology. Raihanul published several papers in journals such as Data, BMC Bioinformatics, International Journal of Molecular Sciences, and conferences such as IEEE BIBM, BIOCOMP. He holds a B.Sc. in Computer Science and Engineering from Bangladesh University of Engineering and Technology (BUET). Prior to joining FIU, he worked as a software engineer at a local software firm in Bangladesh for one and a half years.
Biomarkers are of great importance in cancer research, diagnosis, and treatment and to better understand biological response mechanisms to internal or external intervention. High throughput sequencing technologies, such as RNA sequencing, provide a large volume of gene expression data, enabling data-driven biomarker discovery. Identifying differentially expressed genes (DEGs) based on traditional statistical tests has been the mainstream for discovering cancer biomarkers. However, this approach has three major drawbacks. First, it does not consider biological phenomena, such as association among genes, group of genes working together to perform a common task or initiate a disease, and cross-talk among the groups of genes. Second, it ignores the individual genetic and epigenetic variability, leading to tumor heterogeneity. Third, there should exist a cause-and-effect relationship among cancer biomarkers. We propose three approaches to overcome the drawbacks mentioned above.
(1) We hypothesize that a group of genes work together by forming a clique-like structure, and a bipartite graph can represent the cross-talk between two groups of genes. To prove this hypothesis, gene expression data of three cancers were analyzed separately. The biomarkers identified using the proposed graph-theoretic approaches were prognostically significant.
(2) Intratumor Heterogeneity (ITH) is defined by the diversity of the tumor cell subpopulations, which is the biggest obstacle in precision medicine. The major limitation of the state-of-the-art method in estimating ITH level using transcription profile is that it uses expression values of all the genes. We hypothesize that a reduced set of important genes (biomarkers) is sufficient to estimate the level of ITH. Our proposed deep learning-based feature selection approach was able to identify a reduced set of genes, which effectively estimates ITH levels in different patients.
(3) Gene regulatory network (GRN) is a biological network that captures regulatory interactions among genes. This network is built on causation instead of correlation like a co-expression network. We propose a framework to infer gene regulatory networks from gene expression data using a combination of different GRN inference methods. We hypothesize that the hub genes from the inferred GRN might be potential biomarkers.