School of Computing and Information Sciences
Liangdong is a Ph.D. candidate in the School of Computing and Information Sciences at Florida International University under the supervision of professor Naphtali Rishe. His research interests lie within Geographic Information Systems (GIS), with a focus on geographic data mining. He received a Bachelor of Software Engineering degree from Beihang University in 2010 and a Master of Software Engineering degree from the same University in 2013.
As the widespread use of GPS enabled personal digital devices like cellphones, tablets, smart watches and smart home devices, the geographic data have grown exponentially in the past few years. It is now increasingly important to analyze, understand and model these data as the underlying value of the data is tremendous.
Although data mining for spatial data is a well explored and investigated area, the data mining for geographic data is still a challenging task as geographic datasets usually come with special characteristics that make algorithms and theories for general spatial datasets less effective or not applicable at all. One commonly seen challenge is high dimensionality. Besides the original geographic features (latitude, longitude, altitude and time), there’re typically many additional features which are derived from other sources of information. For example, when predicting the market price of real estate properties, the dataset will usually contain the property’s latitude, longitude, year built, number of stories, number of bedrooms/bathrooms, distance to shopping centers and schools, and so on. And the dimensionality can be further increased by including statistical data like average/lowest/highest market price nearby, average year built nearby and many other features like this. In this example, one can easily expand the dimensionality to 100 or more, in which case most of the traditional algorithms will lose efficacy.
The other problem is that there’re usually implicit relationships among the features, and the relationships are sometimes very strong. In the example above, the geographic coordinates are unavoidably correlated with statistical features. In fact, in most of the application scenarios, coordinates are more or less related with features derived from other sources because these datasets are typically joined by matching the coordinates.
This research aims to develop new algorithms that are specifically designed to address the high dimensionality problem and the implicit relationship problem mentioned above, which are less researched but plays an important role in improving the quality of models learned from the data.