Corrective Classification for Learning from Data Imperfections
|
|
| Speaker: |
Dr. Xingquan(Hill) Zhu
Florida Atlantic University
|
| When: |
Friday, Sept 29, 2006 |
| Time: |
2:00pm - 3:00 pm |
| Where: |
ECS 243
|
|
Abstract:
Learning from imperfect information sources is a challenging and reality issue for real-world data mining applications. Common practices include data cleansing, error detection and classifier ensembling. The essential goal is to reduce noise impacts and eventually enhance the learners built from noise corrupted data. In this talk, I will discuss a corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. Two unique designs make C2 distinct from existing algorithms. On one hand, a set of diverse base learners of C2 that constitute the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce the overall noise impact. Being corrective, the classifier ensemble is built from the data that have been reprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons will demonstrate that C2 is not only superior to the learner built from the original noisy sources, but also more reliable than Bagging or the Aggressive Classifier Ensemble (ACE), which are two degenerated components/variants of C2.
Bio:
Xingquan (Hill) Zhu is an Assistant Professor in the Department of Computer Science & Engineering at Florida Atlantic University, Boca Raton, FL. He received his Ph.D degree in Computer Science from Fudan University, Shanghai, China, in 2001. From Feb. 2001 to Oct. 2002, he was a Postdoctoral Associate in the Department of Computer Science, Purdue University, West Lafayette, IN. From Oct. 2002 to July 2006, He was a Research Assistant Professor in the Department of Computer Science, University of Vermont, Burlington, VT. His research interests include data mining, machine learning, data quality, multimedia systems, and information retrieval.
|