Invited Lecture Series:
Midas Touch on SEC Filings and Government Spending Data
|
|
| Speaker: |
Howard Ho |
| When: |
Friday, October 23rd, 2009 |
| Time: |
2:00pm |
| Where: |
Wertheim Conservatory |
|
Abstract:
Midas is a new project started in late 2008 by the Intelligent Information Integration (aka Clio) Group at IBM Almaden. It aims at extracting, cleansing, and integrating data from multiple, publicly available, data sources. We initially focus on two domains: 1) a financial domain, where the input dataset consists of a heterogeneous collection of company filings with the US Securities & Exchange Commission (SEC), and 2) a US government domain, with data sources containing information about Congress members, earmarks and federal spending. In both domains, we are building a scalable Hadoop-based system where the goal is to transform the data from a document or record view of the world to an object-centric view, where multiple facts about the same real-world entity are merged into one object with, ideally, clean and complete attributes. In this talk, I will describe our initial prototype of the Midas system we built and the new research framework required to support such a scalable system. The main stages of Midas include unstructured information extraction, structured information integration, and temporal analysis and fusion. Our research aims to develop novel algorithms and tools as well as scalable and reusable software modules for all the different stages mentioned above.
Biography:
Howard (Ching-Tien) Ho is a research staff member at the IBM Almaden Research Center currently managing the Intelligent Information Integration group. His current research interests include database, information integration and information extraction. Past research has included schema mapping, XML, data mining, on-line analytical processing (OLAP), communication issues for interconnection networks, algorithms for collective communications, graph embeddings, fault tolerance, and parallel algorithms and architectures. He is a recognized expert on parallel systems having led the Foundations of Massively Parallel Computing, published numerous patents, served as the Editor-in-Chief of Journal of Interconnection Networks and on the editorial board of the IEEE Transactions on Parallel and Distributed Systems, served as the program co-chairs for the ISPAN-2009 and as the vice-chair for 2 parallel processing conferences, and co-edited Large-Scale Parallel Data Mining. Dr. Ho has received multiple formal awards from IBM and has published 18 patents, over 30 journal papers, and over 50 conference papers in these areas. He received a B.S. degree in electrical engineering from National Taiwan University, Taiwan in 1979 and a Ph.D. degree in computer science from Yale University in 1990.
|