Knight Foundation School of Computing and Information Sciences
Boyuan Guan is a Ph.D. candidate in the Knight Foundation School of Computing and Information Sciences at Florida International University. Boyuan works with Dr. Liting Hu, conducting research in system optimization for big data applications. Boyuan is also the Lead Developer of the FIU GIS Center. Boyuan has led the design and research works of the intelligent recommendation library system dpSmart and published the paper “dpSmart: A Flexible Group Based Recommendation Framework for Digital Repository Systems” in the IEEE BIG DATA CONFERENCE in 2019. Boyuan is also involved in several CaaS system optimization and big data streaming optimization studies like “Exploiting the Spam Correlations in Scalable Online Social Spam Detection” and “Towards Adaptive Replication for Hot/Cold Blocks in HDFS using MemCached”. Boyuan is currently leading the system design and architect works for the project Security and Research Hub which U.S. Southern Command funds.
In Hybrid Cloud computing environments, computing resources and distributed storage spaces are supplied through multiple cloud computing providers and mixed in both private and public cloud environments. The big data streaming applications workflows often need to be implemented across multiple provider resources, where copious amounts of data are collected, processed, and stored. Current Hybrid Cloud implementations are bottlenecked with problems as 1) the overall resources are not transparent in the implemented environment, 2) the placement between the application tasks and the resources are not optimized, 3) existing orchestration mechanisms are tied to the specific cloud. This proposal proposes a flexible framework for the big data streaming application to gain optimized performance in the Hybrid Cloud environment. Google Kubernetes and Containers as a Service (CaaS) architecture is utilized to break the barriers between the different cloud providers. A Particle Swarm Optimization (PSO) based placement optimization algorithm is adopted and customized to implement as an additional tier for container placement optimization. We also establish a solid environment for the evaluation of the framework. We use two different common types of streaming data, sensors dataset and social media dataset as the input. We use Apache Kafka, Storm, and Hadoop as the data collectors, processors, and storages applications. The experiment framework is implement in a Hybrid Cloud environment consist of the local data center, Amazon AWS EMR&S3, and Google Engine (GE). Kubernetes is implemented as the centralized container management tool.