Mohammad Abu Obaida
Florida International University
Mohammad Abu Obaida is a Ph.D. candidate at Florida International University’s School of Computing and Information Sciences. He works as a research assistant in the Modeling and Networking Systems Research Group (MNSRG) at FIU, led by Dr. Jason Liu. Obaida received his Bachelor of Science degree in Computer Science and Engineering from Dhaka University of Engineering and Technology (DUET), Bangladesh in January 2012. Obaida has authored and co-authored five research papers, since joining SCIS as a doctoral student in Spring 2013. In Summer 2015, Obaida was a graduate intern at Los Alamos National Laboratory(LANL), NM. Prior to joining FIU, he was employed as a full-time Lecturer at Prime University, Dhaka, Bangladesh.
Parallel application performance predictions offer valuable insight about their actual runs. However, producing high quality predictions for massively parallel applications in complicated and evolving HPC architectures is difficult—even more so when involving large-scale applications and their interactions. Different prediction methods offer different trade-offs between performance and accuracy. In general, the accuracy of performance prediction decreases with growing communication in applications, when interference between the applications and contention on shared physical resources becomes more pronounced. The main challenges of accurate HPC performance prediction include: (1) modeling large-scale computing platforms and interconnection networks, and (2) modeling large-scale application behaviors.
To tackle these challenges, we propose a framework aiming to make accurate and scalable performance predictions using parallel discrete-event simulation with accurate and highly efficient application execution models. As preliminary work, we first developed a large scale parallel HPC architecture and application workload simulator. The simulator can capture application communication behaviors using detailed models of the message passing interface. In addition, the simulator can capture large-scale HPC workloads as affected by job scheduling, task placement, and inter-application interferences. We also developed PyPassT, an automated tool for building parameterized models for HPC applications using static analysis, which can be used in efficient execution-driven simulation for predicting large-scale application performances.
Performance predictions can be error-prone and have low fidelity, especially when modeling irregular application patterns and data dependent runtime dynamics. To model these applications, we propose a method that combines both static analysis and dynamic instrumentation and can automatically build parallel application execution models. The proposed method focuses on a partial evaluation model of the target application, which can be subsequently executed at small scale. Using dynamic instrumentation, we expect to eliminate important data dependencies of the previously parameterized models from static analysis and may thus render more accurate performance predictions.